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Why This Book? 



There are plenty of books that teach introductory data structures. Some of them are very 
good. Most of them cost money and the vast majority of computer science undergraduate 
students will shell-out at least some cash on a data structures book. 

There are a few free data structures books available online. Some are very good, but 
most of them are getting old. The majority of these books became free when the author 
and/or publisher decided to stop updating them. Updating these books is usually not 
possible, for two reasons: (1) The copyright belongs to the author or publisher, who may 
not allow it. (2) The source code for these books is often not available. That is, the Word, 
WordPerfect, FrameMaker, or MpX source for the book is not available, and the version of 
the software that handles this source may not even be available. 

The goal of this project is to forever free undergraduate computer science students 
from having to pay for an introductory data structures book. I have decided to implement 
this goal by treating this book like an Open Source software project. The MpX source, Java 
source, and build scripts for the book are available for download on the book's website 
(opendatastructures.org) and also — more importantly — on a reliable source code 
management site (https : / /github . com/patmorin/ods). 

This source code is released under a Creative Commons Attribution license, mean- 
ing that anyone is free 

• to Share — to copy, distribute and transmit the work; and 

• to Remix — to adapt the work. 

This includes the right to make commercial use of the work. The only condition on these 
rights is 

• Attribution — You must attribute the work by displaying a notice stating the de- 
rived work contains code and/or text from the Open Data Structures Project and/or 
linking to opendatastructures . org. 



in 



Anyone can contribute corrections/fixes using the git source-code management 
system. Anyone can fork from the current version of the book and develop their own 
version (for example, in another programming language). They can then ask that their 
changes be merged back into my version. My hope is that, by doing things this way this 
book will continue to be a useful textbook long after my interest in the project (or my 
pulse, whichever comes first) has waned. 
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Chapter 1 
Introduction 



Every computer science curriculum in the world includes a course on data structures and 
algorithms. Data structures are that important; they improve our quality of life and even 
save lives on a regular basis. Many multi-million and several multi-billion dollar compa- 
nies have been built around data structures. 

How can this be? If we think about it for even a few minutes, we realize that we 
interact with data structures constantly. 

• Open a file: File system data structures are used to locate the parts of that file on 
disk so they can be retrieved. This isn't easy; disks contain hundreds of millions of 
blocks. Your file could be stored on any one of them. 

• Look up a contact on your phone: A data structure is used to lookup a phone number 
based on partial information even before you finish dialing/typing. This isn't easy; 
this information may be looked up in a data structure containing information about 
everyone you have ever had phone or email contact with, and your phone doesn't 
have a very fast processor or a lot of memory. 

• Login to your favourite social network: The network servers use your login informa- 
tion look up your account information. This isn't easy; popular social networks have 
nearly a billion active users. 

• Do a web search: The search engine uses data structures to find the web pages con- 
taining your search terms. This isn't easy; there are over 8.5 billion web pages on the 
Internet and each page contains a lot of potential search terms. 

• Phone emergency services (9-1-1): The emergency services network looks up your 
phone number in a data structure that maps phone numbers to addresses so that 
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police cars, ambulances, or fire trucks can be sent there without delay. This is impor- 
tant; the person making the call may not be able to provide the exact address they 
are calling from and a delay can mean the difference between life or death. 

In the next section, we look at the operations supported by the most commonly 
used data structures. Anyone with even a little bit of programming experience will see 
that these operations are not hard to implement. We can store the data in an array or a 
linked list and any operation can be implemented by iterating over all the elements of the 
array or list and possibly adding or removing an element. 

Although this kind of implementation is easy, it is not very efficient. Imagine an 
application with a moderately-sized data set, say of one million (10 6 ), items. It is rea- 
sonable, in most applications, to assume that the application will want to look up each 
item at least once. This means we can expect to do at least one million (10 6 ) lookups in 
this data. If each of these 10 6 lookups inspects each of the 10 6 items, this gives a total of 
10 6 x 10 6 = 10 12 (one thousand billion) inspections. 

At the time of writing, even a very fast desktop computer can not do more than 
one billion (10 9 ) operations per second. 1 This means that this application will take at least 
10 12 /10 9 = 1000 seconds, or roughly 16 minutes and 40 seconds. 16 minutes is an eon in 
computer time, but a person might be willing to put up with it (if they were headed out 
for a coffee break). 

Now consider a company like Google, that indexes over 8.5 billion web pages. By 
our calculations, doing any kind of query over this data would take at least 8.5 seconds. 
We already know that this isn't the case; web searches complete in much less than 8.5 
seconds, and they do much more complicated queries than just asking if a particular page 
is in their list of indexed pages. Google receives approximately 4, 500 queries per second, 
meaning that they would require at least 4, 500 x 8.5 = 38, 250 very fast servers just to keep 
up. 

The preceding examples suggest that operations that require looking at everything 
in the data structure do not scale well when the number of items, n, in the data structure 
and the number of operations, m, performed on the data structure are both large. In these 
cases, the time (measured in, say, machine instructions) is roughly n x m. 

The solution, of course, is to carefully organize data within the data structure so 
that not every operation requires inspecting every item. It may seem counterintuitive, but 



Computer speeds are at most a few gigahertz (billions of cycles per second) and each operation typically 
takes a few cycles. 
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we will see data structures where a search requires looking at only 2 items on average, 
independent of the number of items stored in the data structure. In our billion instruction 
per second computer, these operations take 0.000000002 seconds to execute a search in a 
data structure containing a billion items (or a trillion, or a quadrillion, or even a quintillion 
items). 

We will also see implementations of data structures that keep the items in sorted 
order, where the number of items inspected during an operation grows very slowly as a 
function of the number of items in the data structure. For example, we can maintain a 
sorted set of one billion items while inspecting at most 60 items during any operation. 
In our billion instruction per second computer, these operations take 0.00000006 seconds 
each. 

The remainder of this chapter briefly reviews some of the main concepts used 
throughout the rest of the book. Section 1.1 describes the interfaces implemented by all 
the data structures described in this book. It should be considered required reading. The 
remaining sections discuss asymptotic (big-Oh) notation, probability and randomization, 
the model of computation, an overview of the rest of the chapters, and the sample code 
and typesetting conventions. A reader with or without a background in these areas can 
easily skip them now and come back to them later if necessary. 

1.1 Interfaces 

In discussing data structures, it is important to understand the difference between a data 
structure's interface and its implementation. An interface describes what a data structure 
does, while an implementation describes how the data structure does it. 

An interface, sometimes also called an abstract data type, defines the set of opera- 
tions supported by a data structure and the semantics, or meaning, of those operations. 
An interface tells us nothing about how the data structure implements these operations, it 
only provides the list of supported operations along with specifications about what types 
of arguments each operation accepts and the value returned by each operation. 

A data structure implementation on the other hand, includes the internal represen- 
tation of the data structure as well as the definitions of the algorithms that implement the 
operations supported by the data structure. Thus, there can be many implementations of a 
single interface. For example, in Chapter 2, we will see implementations of the List inter- 
face using arrays and in Chapter 3 we will see implementations of the List interface using 
pointer-based data structures. Each implements the same interface, List, but in different 



1. Introduction 1.1. Interfaces 





add(x)/enqueue(x) remove()/dequeue() 

Figure 1.1: A FIFO Queue. 

ways. 

1.1.1 The Queue, Stack, and Deque Interfaces 

The Queue interface represents a collection of elements to which we can add elements and 
remove the next element. More precisely, the operations supported by the Queue interface 
are 

• add(x): add the value x to the Queue 

• removeQ: remove the next (previously added) value, y, from the Queue and return y 

Notice that the removeQ operation takes no argument. The Queue's queueing discipline 
decides which element should be removed. There are many possible queueing disciplines, 
the most common of which include FIFO, priority, and LIFO. 

A FIFO (first-in-first-out) Queue, illustrated in Figure 1.1, removes items in the same 
order they were added, much in the same way a queue (or line-up) works when checking 
out at a cash register in a grocery store. This is the most common kind of Queue so the 
qualifier FIFO is often ommitted. In other texts, the add(x) and removeQ operations on a 
FIFO Queue are often called enqueue(x) and dequeueQ, respectively. 

A priority Queue, illustrated in Figure 1.2, always removes the smallest element 
from the Queue, breaking ties arbitrarily. This is similar to the way patients are triaged 
in a hospital emergency room. As patients arrive they are evaluated and then placed in a 
waiting room. When a doctor becomes available they first treat the patient with the most 
life-threatening condition. The remove(x) operation on a priority Queue is usually called 
deleteMinQ in other texts. 

A very common queueing discipline is the LIFO (last-in-first-out) discipline, illus- 
trated in Figure 1.3. In a LIFO Queue, the most recently added element is the next one 
removed. This is best visualized in terms of a stack of plates; plates are placed on the top 
of the stack and also removed from the top of the stack. This structure is so common that 
it gets its own name: Stack. Often, when discussing a Stack, the names of add(x) and 
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Figure 1.2: A priority Queue. 

add(x)/push(x) 




remove()/ pop() 



Figure 1.3: A stack. 



remove() are changed to push(x) and pop(); this is to avoid confusing the LIFO and FIFO 
queueing disciplines. 

A Deque is a generalization of both the FIFO Queue and LIFO Queue (Stack). A 
Deque represents a sequence of elements, with a front and a back. Elements can be added 
at the front of the sequence or the back of the sequence. The names of the operations on 
a Deque are self-explanatory: addFirst(x), removeFirst(), addLast(x), and removel_ast(). 
Notice that a Stack can be implemented using only addFirst(x) and removeFirst() while 
a FIFO Queue can be implemented using only addLast(x) and removeFlrst(). 

1.1.2 The List Interface: Linear Sequences 

This book will talk very little about the FIFO Queue, Stack, or Deque interfaces. This 
is because these interfaces are subsumed by the List interface. A List, illustrated in 
Figure 1.4, represents a sequence, Xo,...,x n _i, of values. The List interface includes the 
following operations: 

1. sizeQ: return n, the length of the list 

2. get(i): return the value x± 



3. set(i, x): set the value of x x equal to x 
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Figure 1.4: A List represents a sequence indexed by 0, l,2,...,n. In this List a call to 
get(2) would return the value c. 

4. add(i,x): add x at position i, displacing x i ,...,x n _ 1 ; 

Set xy +1 = Xj, for all ;' 6 {n - 1,. . ., i}, increment n, and set x, = x 

5. remove(i) remove the value x x , displacing x l+1 ,...,x n _ l ; 
Set Xj = xy +1 , for all j 6 {i,...,n -2} and decrement n 

Notice that these operations are easily sufficient to implement the Deque interface: 

addFirst(x) => add(0,x) 

removeFirst() =^> remove(O) 

addLast(x) => add(size(),x) 

removeLast() => remove(size() - 1) 

Although we will normally not discuss the Stack, Deque and FIFO Queue interfaces 
in subsequent chapters, the terms Stack and Deque are sometimes used in the names of 
data structures that implement the List interface. When this happens, it is to highlight the 
fact that these data structures can be used to implement the Stack or Deque interface very 
efficiently. For example, the ArrayDeque class is an implementation of the List interface 
that implements all the Deque operations in constant time per operation. 

1.1.3 The USet Interface: Unordered Sets 

The USet interface represents an unordered set of unique elements, mimicking a mathe- 
matical set. A USet contains n distinct elements; no element appears more than once; the 
elements are in no specific order. A USet supports the following operations: 

1. size(): return the number, n, of elements in the set 

2. add(x): add the element x to the set if not already present; 

Add x to the set provided that there is no element y in the set such that x equals y. 
Return true if x was added to the set and false otherwise. 
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3. remove(x): remove x from the set; 

Find an element y in the set such that x equals y and remove y. Return y, or null if 
no such element exists. 

4. f ind(x): find x in the set if it exists; 

Find an element y in the set such that y equals x. Return y, or null if no such element 
exists. 

These definitions are a bit fussy about distinguishing x, the element we are remov- 
ing or finding, from y, the element we remove or find. This is because x and y might 
actually be distinct objects that are nevertheless treated as equal. 2 This is a very useful 
distinction since it allows for the creation of dictionaries or maps that map keys onto val- 
ues. This is done by creating a compound object called a Pair that contains a key and a 
value. Two Pairs are treated as equal if their keys are equal. By storing Pairs in a USet, 
we can find the value associated with any key k by creating a Pair, x, with key k and using 
the £ ind(x) method. 

1.1.4 The SSet Interface: Sorted Sets 

The SSet interface represents a sorted set of elements. An SSet stores elements from some 
total order, so that any two elements x and y can be compared. In code examples, this will 
be done with a method called compare(x, y) in which 



compare(x, y) 



< if x < y 
> if x > y 
= if x = y 



An SSet supports the sizeQ, add(x), and remove(x) methods with exactly the same seman- 
tics as in the USet interface. The difference between a USet and an SSet is in the f ind(x) 
method: 

4. f ind(x): locate x in the sorted set; 

Find the smallest element y in the set such that y > x. Return y or null if no such 
element exists. 

This version of the f ind(x) operation is sometimes referred to as a successor search. 
It differs in a fundamental way from USet.f ind(x) since it returns a meaningful result even 
when there is no element in the set that is equal to x. 



"In Java, this is done by overriding the class' equals(y) and hashCode() methods. 
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The distinction between the USet and SSet f ind(x) operations is very important 
and is very often missed. The extra functionality provided by an SSet usually comes with 
a price that includes both a larger running time and a higher implementation complexity. 
For example, most of the SSet implementations discussed in this book all have f ind(x) 
operations with running times that are logarithmic in the size of the set. On the other 
hand, the implementation of a USet as a ChainedHashTable in Chapter 5 has a f ind(x) 
operation that runs in constant expected time. When choosing which of these structures 
to use, one should always use a USet unless the extra functionality offered by an SSet is 
really needed. 

1.2 Mathematical Background 

In this section, we review some mathematical notations and tools used throughout this 
book, including logarithms, big-Oh notation, and probability theory. This is intended to 
be a review of these topics, not an introduction. Any reader who feels they are missing 
this background is encouraged to read, and do exercises from, the appropriate sections of 
the very good (and free) textbook on mathematics for computer science [48]. 

1.2.1 Exponentials and Logarithms 

The expression b x denotes the number b raised to the power of x. If x is a positive integer, 
then this is just the value of b multiplied by itself x - 1 times: 

b x -bxbx---xb . 



When x is a negative integer, b x - \/b~ x . When x — 0, b x — 1. When b is not an integer, we 
can still define exponentiation in terms of the exponential function e x (see below), which 
is itself defined in terms of the exponential series, but this is best left to a calculus text. 

In this book, the expression log;, k denotes the base-b logarithm of k. That is, the 
unique value x that satisfies 

b x = k . 

Most of the logarithms in this book are base 2 (binary logarithms), in which case we drop 
the base, so that log A: is shorthand for log 2 fc. 

Another logarithm that comes up several times in this book is the natural logarithm. 
Here we use the notation In A: to denote log e k, where e — Euler's constant — is given by 



lim (l + -\ 

i->oo\ n) 



lim 1 + - * 2.71828 . 



I 



1. Introduction 1.2. Mathematical Background 

The natural logarithm comes up frequently because it is the value of a particularly com- 
mon integral: 

1/xdx = In A: . 
'l 

Two of the most common manipulations we do with logarithms are removing them from 
an exponent: 

and changing the base of a logarithm: 

For example, we can use these two manipulations to compare the natural and binary log- 
arithms 

lnk = We = (1^2) = (ln2)(lo8(: » - *«*>*"<** • 

1.2.2 Factorials 

In one or two places in this book, the factorial function is used. For a non-negative integer 
n, the notation n\ (pronounced "n factorial") is defined to mean 

n! = l-2-3 n . 

Factorials appear because n\ counts the number of distinct permutations, i.e., orderings, 
of n distinct elements. For the special case n = 0, 0! is defined as 1. 

The quantity n\ can be approximated using Stirling's Approximation: 

w! = V27tn(-)% a(n) , 

where 

< a(n) < 



12n + l x ' 12n 
Stirling's Approximation also approximates ln(n!): 

ln(n!) = nlnn- n + -ln(2nn) + a(n) 

(In fact, Stirling's Approximation is most easily proven by approximating ln(n!) = lnl + 

J'tl 
\nndn - nlnn-n+ I.) 

Related to the factorial function are the binomial coefficients. For a non-negative 
integer n and an integer k 6 {0,...,n}, the notation (£) denotes: 

n\ 



k) k\(n-k)\ 



1. Introduction 1.2. Mathematical Background 

The binomial coefficient (T) (pronounced "n choose k") counts the number of subsets of an 
n element set that have size k, i.e., the number of ways of choosing k distinct integers from 
the set {l,...,n}. 

1.2.3 Asymptotic Notation 

When analyzing data structures in this book, we will want to talk about the running times 
of various operations. The exact running times will, of course, vary from computer to 
computer and even from run to run on an individual computer. When we talk about the 
running time of an operation we are referring to the number of computer instructions 
performed during the operation. Even for simple code, this quantity can be difficult to 
compute exactly. Therefore, instead of analyzing running times exactly, we will use the 
so-called big-Oh notation: For a function f(n), 0(f(n)) denotes a set of functions, 

0(f(n)) - {g(n) : there exists c> 0, and m such that g(n) < c ■ f(n) for all n > n ] . 

Thinking graphically, this set consists of the functions g(n) where c-f(n) starts to dominate 
g(n) when n is sufficiently large. 

We generally use asymptotic notation to simplify functions. For example, in place 
of 5nlog« + 8m - 200 we can write, simply, O(nlogn). This is proven as follows: 

5«logn + 8 m - 200 < 5MlogM + 8m 

< 5nlog« + 8n log n for n > 2 (so that log« > 1) 

< 13m log n 

which demonstrates that the function f(n) = 5m log n + 8m- 200 is in the set O(logM) using 
the constants c = 13 and m = 2. 

There are a number of useful shortcuts when using asymptotic notation. First: 

0(n C| )cO(n c ') , 

for any c x < c 2 . Second: For any constants a,b,c> 0, 

O(a) c 0(logn) c 0(n b ) c 0(c") . 

These inclusion relations can be multiplied by any positive value, and they still hold. For 
example, multiplying by n yields: 

0(n) c O(MlogM) c 0(n l+h ) c 0(nc n ) . 
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Continuing in a long and distinguished tradition, we will abuse this notation by 
writing things like f\{n) = 0(f(n)) when what we really mean is f\{n) 6 0(f(n)). We 
will also make statements like "the running time of this operation is 0(f(n))" when this 
statement should be "the running time of this operation is a member of 0(f(n))." These 
shortcuts are mainly to avoid awkward language and to make it easier to use asymptotic 
notation within strings of equations. 

A particularly strange example of this comes when we write statements like 

T(n) = 2logn + 0(l) . 
Again, this would be more correctly written as 

T(n) < 21og« + [some member of 0(1)] . 

The expression 0(1) also brings up another issue. Since there is no variable in this 
expression, it may not be clear which variable is getting arbitrarily large. Without context, 
there is no way to tell. In the example above, since the only variable in the rest of the 
equation is n, we can assume that this should be read as T(n) - 21og« + 0(f(n)), where 
/(«) = !■ 

Big-Oh notation is not new or unique to computer science. It was used by number 
theorist Paul Bachmann as early as 1894, but it is immensely useful in describing the 
running times of computer algorithms. Consider a piece of code like the following: 

Simple 



void snippet ( ) { 

for (int i = 0; i < n; i++) 
a[i] = i; 
} 



One execution of this method involves 

• 1 assignment (inti = 0), 

• n + 1 comparisons (i < n), 

• n increments (i + +), 

• n array offset calculations (a[i]), and 

• n indirect assignments (a[i] = i). 
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So we could write this running time as 

T(n) = a + b(n + 1) + en + dn + en , 

where a, b, c, d, and e are constants that depend on the the machine running the code 
and represent the time to perform assignments, comparisons, increment operations, array 
offset calculations, and indirect assignments, respectively. If this is the expression for the 
running time of two lines of code, then clearly this kind of analysis will not be tractable to 
complicated code or algorithms. Using big-Oh notation, the running time simplifies down 
to 

T(n) = 0(n) . 

Not only is this more compact, it also gives nearly as much information. The fact that 
the running time depends on the constants a, b, c, d, and e in the above example means 
that, in general, it will not be possible to compare two running times to know which is 
faster without knowing the values of these constants. Even if we go through the effort of 
determining these constants (say, through timing tests) then our conclusion will only be 
valid for the machine we run our tests on. 

Big-Oh notation allows us to reason at a much higher level, making it possible 
to analyze much more complicated functions. If two algorithms have the same big-Oh 
running time, then we won't know which is faster, and there may not be a clear winner. 
One may be faster on one machine and the other may be faster on a different machine. 
However, if the two algorithms have a demonstrably different big-Oh running time, then 
we can be certain that the one with the smaller running time will be faster for large enough 
values ofn. 

An example of how big-Oh notation allows us to compare two different functions 
is shown in Figure 1.5, which compares the rate of grown of /i(n) = 15n versus J2{n) = 
2nlogn. It might be that f\{n) is the running time of a complicated linear time algorithm 
while f2(n) is the running time of a considerably simpler algorithm based on the divide- 
and-conquer paradigm. This illustrates that, although /i(n) is greater than f2(n) for small 
values of n, the opposite is true for large values of n. Eventually /i(n) wins out, by an 
increasingly margin. Analysis using big-Oh notation told us that this would happen, since 
0(n)cO(nlogn). 

In a few cases, we will use asymptotic notation on functions with more than one 
variable. There seems to be no standard for this, but for our purposes, the following 
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Figure 1.5: Plots of 15n versus 2nlogn. 
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definition is sufficient: 



0(f(n v ...,n k )) 



g{n x ,...,n k ) : there exists c> 0, and z such that 
g(n 1 ,...,n k )<c-f{n 1 ,...,n k ) 

for all n\,...,n k such that g{n x ,...,n k ) > z 



This definition captures the situation we really care about: when the arguments n\,...,n k 
make g take on large values. This agrees with the univariate definition of 0(f(n)) when 
/(n) is an increasing function of n. The reader should be warned that, although this works 
for our purposes, other texts may treat multivariate functions and asymptotic notation 
differently. 

1.2.4 Randomization and Probability 

Some of the data structures presented in this book are randomized; they make random 
choices that are independent of the data being stored in them or the operations being 
performed on them. For this reason, performing the same set of operations more than 
once using these structures could result in different running times. When analyzing these 
data structures we are interested in their average or expected running times. 

Formally the running time of an operation on a randomized data structure is a 
random variable and we want to study its expected value. For a discrete random variable X 
taking on values in some countable universe U, the expected value of X, denoted by E[X] 
is given the by the formula 

E[X] = J^x-J>r{X = x] . 
xeU 

Here Pr{£} denotes the probability that the event £ occurs. In all the examples in this 

book, these probabilities are only with respect to whatever random choices are made by 

the randomized data structure; there is no assumption that the data stored in the structure 

is random or that the sequence of operations performed on the data structure is random. 

One of the most important properties of expected values is linearity of expectation: 
For any two random variables X and Y , 

E[X + Y] = E[X] + E[Y] . 

More generally, for any random variables X\, . ..,X k , 

k I k 



r = l 



j^x k =jy[x { ] 



2 = 1 
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Linearity of expectation allows us to break down complicated random variables (like the 

left hand sides of the above equations) into sums of simpler random variables (the right 

hand sides). 

A useful trick, that we will use repeatedly, is that of defining indicator random 

variables. These binary variables are useful when we want to count something and are best 

illustrated by an example. Suppose we toss a fair coin k times and we want to know the 

expected number of times the coin comes up heads. Intuitively we know the answer is 

fc/2, but if we try to prove it using the definition of expected value, we get 

k 
E[X] = V'z-Pr{X = z} 

1=0 

k n ; 






;=o ' ' 
= k/2 . 

This requires that we know enough to calculate that Pr{X = i) - n)/2 fc , that we know the 
binomial identity i{ •) = k( j 1 ), and that we know the binomial identity H !=0 (,-) = 2 k . 

Using indicator variables and linearity of expectation makes things much easier: 
For each i e {1, . . . , k}, define the indicator random variable 

)1 if the fth coin toss is heads 
otherwise. 



Then 



Novf,X = Y% =1 i if so 



E[Ij] = (1/2)1 + (1/2)0 = 1/2 



E[X]=E 



k 
L 1=1 



i=i 

k 
£l/2 



1=1 
= k/2 . 
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This is a bit more long-winded, but doesn't require that we know any magical identities 
or compute any non-trivial probabilities. Even better: It agrees with the intuition that 
we expect half the coins to come up heads precisely because each individual coin has 
probability 1/2 of coming up heads. 



1.3 The Model of Computation 

In this book, we will analyze the theoretical running times of operations on the data struc- 
tures we study. To do this precisely we need a mathematical model of computation. For 
this, we use the w-bit word-RAM model. RAM stands for Random Access Machine. In this 
model, we have access to a random access memory consisting of cells, each of which stores 
a w-bit word. This implies a memory cell can represent, for example, any integer in the set 
{0,...,2 W -1}. 

In the word-RAM model, basic operations on words take constant time. This in- 
cludes arithmetic operations (+, -, *, /, %), comparisons (<, >, =, <, >), and bitwise boolean 
operations (bitwise-AND, OR, and exclusive-OR). 

Any cell can be read or written in constant time. Our computer's memory is man- 
aged by a memory management system from which we can allocate or deallocate a block 
of memory of any size we like. Allocating a block of memory of size k takes O(k) time and 
returns a reference to the newly-allocated memory block. This reference is small enough 
to be represented by a single word. 

The word-size, w, is a very important parameter of this model. The only assumption 
we will make on w is that it is at least w > logn, where n is the number of elements stored 
in any of our data structures. This is a fairly modest assumption, since otherwise a word 
is not even big enough to count the number of elements stored in the data structure. 

Space is measured in words so that, when we talk about the amount of space used 
by a data structure, we are referring to the number of words of memory used by the struc- 
ture. All our data structures store values of a generic type T and we assume an element 
of type T occupies one word of memory. (In reality, we are storing references to objects of 
type T, and these references occupy only one word of memory.) 

The w-bit word-RAM model is a fairly close match for the (32-bit) Java Virtual 
Machine (JVM) when w = 32. The data structures presented in this book don't use any 
special tricks that are not implementable on the JVM and most other architectures. 
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1.4 Correctness, Time Complexity, and Space Complexity 

When studying the performance of a data structure, there are three things that matter 
most: 

• Correctness: The data structure should correctly implement its interface. 

• Time complexity: The running times of operations on the data structure should be 
as small as possible. 

• Space complexity: The data structure should use as little memory as possible. 

In this introductory text, we will take correctness as a given; we won't consider data 
structures that give incorrect answers to queries or that don't properly perform updates. 
We will, however, see data structures that make an extra effort to keep space usage to a 
minimum. This won't usually affect the (asymptotic) running times of operations, but can 
make the data structures a little slower in practice. 

When studying running times in the context of data structure we tend to come 
across three different kinds of running time guarantees: 

• Worst-case running times: These are the strongest kind of running time guarantees. 
If a data structure operation has a worst-case running-time of f(n), then one of these 
operations never takes longer than f(n) time. 

• Amortized running times: If we say that the amortized running time of an operation 
in a data structure is /(n), this means that the cost of a typical operation is at most 
/(n). More precisely, if a data structure has an amortized running time of /(n), then a 
sequence of m operations takes at most m/(n) time. Some individual operations may 
take more than /(n) time but the average, over the entire sequence of operations, is 
at most /(n). 

• Expected running times: If we say that the expected running time of an operation on 
a data structure is /(n), this means that the actual running time is a random variable 
(see Section 1 .2.4) and the expected value of this random variable is at most /(n). The 
randomization here is with respect to random choices made by the data structure. 

To understand the difference between worst-case, amortized, and expected running 
times, it helps to consider some financial examples. In most countries, the amount of 
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income tax someone pays is proportional to their income. Someone who must pay $36,000 
of income taxes per year may choose to pay their taxes all at once, at the end of the fiscal 
year. Alternatively they may choose to have $3,000 of taxes deducted directly from their 
monthly paycheques. The latter person has a worst-case income tax cost of $3,000 per 
month, while the former person has an amortized monthly cost (over a 12 month period) 
of $3,000 per month. 

An example of an expected cost is the cost of automobile repair. Over a period 
of 12 months, an automobile may work fine, or it may break down. If we have enough 
information about automobiles, we may learn that the expected (or average) yearly repair 
cost for a particular make of automobile is $2,000. It may be more, or it may be less. There 
is no hard and fast guarantee about what repairs will cost over a year, two years, or even a 
decade. 

One may wonder why we would ever settle for an amortized or expected running- 
time over a worst-case running time. The reason this happens is that it is often possible 
to get a lower expected or amortized running-time than worst-case running-time. At the 
very least, it is very often possible to get a much simpler data structure if one is willing to 
settle for amortized or expected running times. 

The same scenario occurs in many financial settings as well. Mortgage companies 
allow homeowners to amortized enormous one-time costs (the cost of purchasing a house) 
over long periods of regular costs (mortgage payments). Home insurance companies pro- 
tect against unlikely but very expensive events (floods, fires, and so on) by replacing them 
with predictable worst-case costs (insurance premiums). In these cases, if one could afford 
the risk of a fire or flood, or the cost of buying house outright, it would be cheaper and 
simpler to forgo the mortgage and/or insurance. 

1.5 Code Samples 

The code samples in this book are written in the Java programming language. However 
to make the book accessible even to readers not familiar with all of Java's constructs and 
keywords, the code samples have been simplified. For example, a reader won't find any of 
the keywords public, protected, private, or static. A reader also won't find much dis- 
cussion about class hierarchies. Which interfaces a particular class implements or which 
class it extends, if relevant to the discussion, will be clear from the accompanying text. 

These conventions should make most of the code samples understandable by any- 
one with a background in any of the languages from the ALGOL tradition, including B, 
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List implementations 




get(i)/set(i,x) 


add(i, x)/remove(i) 




ArrayStack 


0(1) 


0(l+n-i) A 


§2.1 


ArrayDeque 


0(1) 


0(l+min{i,n-i}) A 


§2.4 


DualArray Deque 


0(1) 


0(1 + min{i, n-i}) A 


§2.5 


RootishArrayStack 


0(1) 


0(l + n-i) A 


§2.6 


DLList 


0(1 + min{i,n-i}) 


0(1 + min{i, n- 1}) 


§3.2 


SEList 


0(l + min{i,n-i}/b) 


0(b + min{i,n-i}/b) A 


§3.3 


SkiplistList 


0(logn) E 


0(logn) E 


§4.3 



USet implementations 




find(x) 


add(x)/remove(x) 




ChainedHashTable 
LinearHashTable 


0(1) E 
0(1) E 


0(1) A ' E 

(1)A,E 


§5.1 

§5.2 



A Denotes an amortized running time. 
E Denotes an expected running time. 

Table 1.1: Summary of List and USet implementations. 

C, C++, C#, D, Java, JavaScript, and so on. Readers who want the full details of all imple- 
mentations are encouraged to look at the Java source code that accompanies this book. 

This book mixes mathematical analysis of running times with Java source code for 
the algorithms being analyzed. This means that some equations contain variables also 
found in the source code. These variables are typeset consistently both within the source 
code and within equations. The most common such variable is the variable n that, without 
exception, always refers to the number of items currently stored in the data structure. 

1.6 List of Data Structures 

Tables 1.1 and 1.2 summarizes the performance of data structures in this book that im- 
plement each of the interfaces, List, USet, and SSet, described in Section 1.1. Figure 1.6 
shows the dependencies between various chapters in this book. A dashed arrow indicates 
only a weak-dependency in which only a small part of the chapter depends on a previous 
chapter or only the main results of the previous chapter. 
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SSet implementations 




find(x) 


add(x)/remove(x) 




SkiplistSSet 


0(logn) E 


0(logn) E 


§4.2 


Treap 


0(logn) E 


0(logn) E 


§7.2 


ScapegoatTree 


O(logn) 


0(logn) A 


§8.1 


RedBlackTree 


O(logn) 


O(logn) 


§9.2 


BinaryTrle 1 


0(w) 


O(w) 


§13.1 


XFastTrie 1 


0(logw) A ' E 


0(w) A ' E 


§13.2 


YFastTrie 1 


0(logw) A ' E 


0(logw) A < E 


§13.3 


BTree 


O(logn) 


0(B + logn) A 


§14.2 


BTree x 


0(log B n) 


0(log B n) 


§14.2 



(Priority) Queue implementations 




findMin() 


add(x)/remove() 




BinaryHeap 


0(1) 


0(logn) A 


§10.1 


MeldableHeap 


0(1) 


0(logn) E 


§10.2 



1 This structure can only store w-bit integer data. 
x This denotes the running time in the external-memory 
model; see Chapter 14. 

Table 1.2: Summary of SSet and priority Queue implementations. 
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1. Introduction 



2. Array-based lists 



6. Binary trees 





7. Random binary search trees 



8. Scapegoat trees 



9. Red-black trees 



10. Heaps 



»■ 12. Graphs 



13. Data structures for integers 



14. External-memory searching 



11. Sorting algorithms 



11.1.2. Quicksort 



11.1.3. Heapsort 



Figure 1.6: The dependencies between chapters in this book. 
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1.7 Discussion and Exercises 

The List, USet, and SSet interfaces described in Section 1.1 are influenced by the Java Col- 
lections Framework [52]. These are essentially simplified versions of the List, Set/Map, 
and SortedSet/SortedMap interfaces found in the Java Collections Framework. Indeed, 
the accompanying source code includes wrapper classes for making USet and SSet imple- 
mentations into Set, Map, SortedSet, and SortedMap implementations. 

For a superb (and free) treatment of the mathematics discussed in this chapter, in- 
cluding asymptotic notation, logarithms, factorials, Stirling's approximation, basic prob- 
ability, and lots more, see the textbook by Leyman, Leighton, and Meyer [48]. For a gen- 
tle calculus text that includes formal definitions of exponentials and logarithms, see the 
(freely available) classic text by Thompson [69]. 

For more information on basic probability, especially as it relates to computer sci- 
ence, see the textbook by Ross [63]. Another good reference, that covers both asymptotic 
notation and probability, is the textbook by Graham, Knuth, and Patashnik [35]. 

Readers wanting to brush up on their Java programming can find many Java tuto- 
rials online [54]. 

Exercise 1.1. This exercise is designed to help get the reader familiar with choosing the 
right data structure for the right problem. If implemented, they should be done by making 
use of an implementation of the relevant interface (Stack, Queue, Deque, USet, or SSet) 
provided by the Java Collections Framework. 

Solve the following problems by reading a text file one line at a time and perform- 
ing operations on each line in the appropriate data structure(s). Your implementations 
should be fast enough that even files containing a million lines can be processed in a few 
seconds. 

1. Read the input one line at a time and then write the lines out in reverse order, so that 
the last input line is printed first, then the second last input line, and so on. 

2. Read the first 50 lines of input and then write them out in reverse order. Read the 
next 50 lines and then write them out in reverse order. Do this until there are no 
more lines left to read, at which point any remaining lines should be output in re- 
verse order. 

In other words, your output will start with the 50th line, then the 49th, then the 48th, 
and so on down to the first line. This will be followed by the 100th line, followed by 
the 99th, and so on down to the 51st line. And so on. 
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Your code should never have to store more than 50 lines at any given time. 

3. Read the input one line at a time. At any point after reading the first 42 lines, if some 
line is blank (i.e., a string of length 0) then output the line that occured 42 lines prior 
to that one. For example, if Line 242 is blank, then your program should output line 
200. This program should be implemented so that it never stores more than 43 lines 
of the input at any given time. 

4. Read the input one line at a time and write each line to the output if it is not a 
duplicate of some previous input line. Take special care so that a file with a lot of 
duplicate lines does not use more memory than what is required for the number of 
unique lines. 

5. Read the input one line at a time and write each line to the output only if you have 
already read this line before. (The end result is that you remove the first occurrence 
of each line.) Take special care so that a file with a lot of duplicate lines does not use 
more memory than what is required for the number of unique lines. 

6. Read the entire input one line at a time. Then output all lines sorted by length, with 
the shortest lines first. In the case where two lines have the same length, resolve their 
order using the usual "sorted order." Duplicate lines should be printed only once. 

7. Do the same as the previous question except that duplicate lines should be printed 
the same number of times that they appear in the input. 

8. Read the entire input one line at a time and then output the even numbered lines 
(starting with the first line, line 0) followed by the odd-numbered lines. 

9. Read the entire input one line at a time and randomly permute the lines before out- 
putting them. To be clear: You should not modify the contents of any line. Instead, 
the same collection of lines should be printed, but in a random order. 

Exercise 1.2. From scratch, write and test implementations of the List, USet and SSet 
interfaces. These do not have to be efficient. They can be used later to test the correctness 
and performance of more efficient implementations. (The easiest way to do this is to store 
the elements in an array.) 

Exercise 1.3. Work to improve the performance of your implementations from the previous 
question using any tricks you can think of. Experiment and think about how you could 
improve the performance of add(i, x) and remove(i) in your List implementation. Think 
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about how you could improve the performance of the f ind(x) operation in your USet and 
SSet implementations. This exercise is designed to give you a feel for how difficult it can 
be to obtain efficient implementations of these interfaces. 
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Chapter 2 



Array-Based Lists 



In this chapter, we study implementations of the List and Queue interfaces where the 
underlying data is stored in an array called the backing array. The following table summa- 
rizes the running times of operations for the data structures presented in this chapter: 





get(i)/set(i,x) 


add(i, x)/remove(i) 


ArrayStack 


0(1) 


O(n-i) 


ArrayDeque 


0(1) 


0(min{i,n- i}) 


DualArrayDeque 


0(1) 


0(min{i,n- i}) 


RootishArray Stack 


O(l) 


O(n-i) 



Data structures that work by storing data in a single array have many advantages 
and limitations in common: 

• Arrays offer constant time access to any value in the array. This is what allows get(i) 
and set(i, x) to run in constant time. 

• Arrays are not very dynamic. Adding or removing an element near the middle of a 
list means that a large number of elements in the array need to be shifted to make 
room for the newly added element or to fill in the gap created by the deleted element. 
This is why the operations add(i, x) and remove(i) have running times that depend 
on n and i. 

• Arrays cannot expand or shrink. When the number of elements in the data struc- 
ture exceeds the size of the backing array a new array needs to be allocated and the 
data from the old array needs to be copied into the new array. This is an expensive 
operation. 
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The third point is important. The running times cited in the table above do not include the 
cost of growing and shrinking the backing array. We will see that, if carefully managed, 
the cost of growing and shrinking the backing array does not add much to the cost of an 
average operation. More precisely if we start with an empty data structure, and perform 
any sequence of m add(i, x) or remove(i) operations, then the total cost of growing and 
shrinking the backing array over the entire sequence of m operations is O(m). Although 
some individual operations are more expensive, the amortized cost, when amortized over 
all m operations, is only 0(1) per operation. 

2.1 ArrayStack: Fast Stack Operations Using an Array 

An ArrayStack implements the list interface using an array a, called the backing array. The 
list element with index i is stored in a[i]. At most times, a is larger than strictly necessary, 
so an integer n is used to keep track of the number of elements actually stored in a. In this 
way, the list elements are stored in a[0],. . .,a[n - 1] and, at all times, a. length > n. 
ArrayStack 



T[] a; 
int n; 
int size( ) { 

return n; 
} 



2.1.1 The Basics 

Accessing and modifying the elements of an ArrayStack using get(i) and set(i, x) is triv- 
ial. After performing any necessary bounds-checking we simply return or set, respectively 
a[i]. 

ArrayStack 



T get(int i) { 
return a[i] ; 

} 

T set(int i, T x) { 

T y = a[i]; 

a[i] = x; 

return y; 
} 



The operations of adding and removing elements from an ArrayStack are illus- 
trated in Figure 2.1. To implement the add(i, x) operation, we first check if a is already 
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2.1. ArrayStack: Fast Stack Operations Using an Array 
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b r e e d r 
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remove(4) 



remove(4) 



remove(4) 



set(2,i) 



Figure 2.1: A sequence of add(i, x) and remove(i) operations on an ArrayStack. Arrows 
denote elements being copied. Operations that result in a call to resize() are marked with 
an asterisk. 



full. If so, we call the method reslze() to increase the size of a. How resizeQ is im- 
plemented will be discussed later. For now, it is sufficient to know that, after a call to 
resizeQ, we can be sure that a. length > n. With this out of the way, we now shift the 
elements a[i], ...,a[n - 1] right by one position to make room for x, set a[i] equal to x, and 
increment n. 

ArrayStack 



void add(int i, T x) { 

if (n + 1 > a. length) resize(); 

for (int j = n; j > i; j--) 
a[j] = a[ j-1] ; 

a[i] = x; 

n++; 
} 
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If we ignore the cost of the potential call to resize(), the cost of the add(i, x) operation is 
proportional to the number of elements we have to shift to make room for x. Therefore the 
cost of this operation (ignoring the cost of resizing a) is 0(n - i + 1). 

Implementing the remove(i) operation is similar. We shift the elements a[i + 1 ],..., a[n - 1 
left by one position (overwriting a[i]) and decrease the value of n. After doing this, we 
check if n is getting much smaller than a. length by checking if a. length > 3n. If so, we 
call resizeQ to reduce the size of a. 

ArrayStack 



T remove(int i) { 

T x = a[i] ; 

for (int j = i; j < n-1; j++) 
a[ j] = a[ j+1]; 

n--; 

if (a. length >= 3*n) resize(); 

return x; 
} 



If we ignore the cost of the resizeQ method, the cost of a remove(i) operation is propor- 
tional to the number of elements we shift, which is 0(n - i). 

2.1.2 Growing and Shrinking 

The resizeQ method is fairly straightforward; it allocates a new array b whose size is 2n 
and copies the n elements of a into the first n positions in b, and then sets a to b. Thus, 
after a call to resizeQ, a. length = 2n. 

ArrayStack 



void resize( ) { 

T[] b = newArray( Math. max (n*2, 1 )) ; 
for (int i = 0; i < n; i++) { 
b[i] = a[i]; 

} 

a = b; 

} 



Analyzing the actual cost of the resizeQ operation is easy. It allocates an array b 
of size 2n and copies the n elements of a into b. This takes O(n) time. 

The running time analysis from the previous section ignored the cost of calls to 
resizeQ. In this section we analyze this cost using a technique known as amortized anal- 
ysis. This technique does not try to determine the cost of resizing during each individual 
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add(i, x) and remove(i) operation. Instead, it considers the cost of all calls to resize() 
during a sequence of m calls to add(i, x) or remove(i). In particular, we will show: 

Lemma 2.1. If an empty ArrayList is created and any sequence of m > 1 calls to add(i, x) and 
remove(i) are performed, then the total time spent during all calls to resize() is O(m). 

Proof. We will show that anytime resizeQ is called, the number of calls to add or remove 
since the last call to resizeQ is at least n/2 - 1. Therefore, if n,- denotes the value of n 
during the ith call to resize() and r denotes the number of calls to resizeQ, then the total 
number of calls to add(i, x) or remove(i) is at least 



r 



Yjnf/2-l)<m , 



i=i 



which is equivalent to 



> n, < 2m + 2r . 



On the other hand, the total time spent during all calls to resizeQ is 

r 

^0(n i )<0(m + r) = 0(m) , 
i=i 

since r is not more than m. All that remains is to show that the number of calls to add(i, x) 
or remove(i) between the (i - l)th and the ith call to resizeQ is at least n,/2. 

There are two cases to consider. In the first case, resizeQ is being called by 
add(i, x) because the backing array a is full, i.e., a. length = n = n,-. Consider the previ- 
ous call to resizeQ: After this previous call, the size of a was a. length, but the number 
of elements stored in a was at most a. length/2 = n,/2. But now the number of elements 
stored in a is n, = a. length, so there must have been at least n,/2 calls to add(i, x) since the 
previous call to resizeQ. 

The second case to consider is when resizeQ is being called by remove(i) because 
a. length > 3n = 3n ; . Again, after the previous call to resizeQ the number of elements 
stored in a was at least a. length/2 - l. 1 Now there are n, < a. length/3 elements stored in 
a. Therefore, the number of remove(i) operations since the last call to resizeQ is at least 

a. length/2 - 1 - a.length/3 = a.length/6 - 1 = (a.length/3)/2 - 1 > n,/2 - 1 . 



The - 1 in this formula accounts for the special case that occurs when n = and a. length = 1. 
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In either case, the number of calls to add(i, x) or remove(i) that occur between the (i - l)th 
call to resizeQ and the ith call to resize() is at least n,/2 - 1, as required to complete the 
proof. □ 

2.1.3 Summary 

The following theorem summarizes the performance of an ArrayStack: 

Theorem 2.1. An ArrayStack implements the List interface. Ignoring the cost of calls to 
resizeQ, an ArrayStack supports the operations 

• get(i) and set(i, x) in O(l) time per operation; and 

• add(i, x) and remove(i) in 0(1 + n - i) time per operation. 

Furthermore, beginning with an empty ArrayStack, any sequence ofm add(i, x) and remove(i) 
operations results in a total of 0(m) time spent during all calls to resize(). 

The ArrayStack is an efficient way to implement a Stack. In particular, we can 
implement push(x) as add(n, x) and pop() as remove(n- 1), in which case these operations 
will run in 0(1) amortized time. 

2.2 FastArrayStack: An Optimized ArrayStack 

Much of the work done by an ArrayStack involves shifting (by add(i, x) and remove(i)) 
and copying (by resizeQ) of data. In the implementations shown above, this was done 
using for loops. It turns out that many programming environments have specific func- 
tions that are very efficient at copying and moving blocks of data. In the C program- 
ming language, there are the memcpy(d, s, n) and memmove(d, s, n) functions. In C++ there is 
the std :: copy(aO, a1,b) and algorithm. In Java there is the System.arraycopy(s,i,d, j,n) 

method. 

FastArrayStack 



void resize( ) { 

T[] b = newArray (Math. max ( 2*n, 1 )) ; 
System. arraycopy (a, 0, b, 0, n); 
a = b; 

} 

void add(int i, T x) { 

if (n + 1 > a. length) resize(); 

System. arraycopy (a, i, a, i+1, n-i); 

a[i] = x; 

n++; 
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} 

T remove(int i) { 

T x = a[i] ; 

System. arraycopy (a, i+1, a, i, n-i-1); 

n--; 

if (a. length >= 3*n) resize(); 

return x; 
} 



These functions are usually highly optimized and may even use special machine 
instructions that can do this copying much faster than we could do using a for loop. 
Although using these functions does not asymptotically decrease the running times, it 
can still be a worthwhile optimization. In the Java implementations here, the use of 
System.arraycopy(s, i, d, j, n) resulted in speedups of a factor between 2 and 3, depending 
on the types of operations performed. Your mileage may vary. 

2.3 ArrayQueue: An Array-Based Queue 

In this section, we present the ArrayQueue data structure, which implements a FIFO (first- 
in-first-out) queue; elements are removed (using the remove() operation) from the queue 
in the same order they are added (using the add(x) operation). 

Notice that an ArrayStack is a poor choice for an implementation of a FIFO queue. 
The reason is that we must choose one end of the list to add to and then remove from the 
other end. One of the two operations must work on the head of the list, which involves 
calling add(i, x) or remove(i) with a value of i = 0. This gives a running time proportional 
to n. 

To obtain an efficient array-based implementation of a queue, we first notice that 
the problem would be easy if we had an infinite array a. We could maintain one index j 
that keeps track of the next element to remove and an integer n that counts the number of 
elements in the queue. The queue elements would always be stored in 

a[j],a[j + 1],...,a[j + n-1] . 

Initially, both j and n would be set to 0. To add an element, we would place it in a[ j + n] 
and increment n. To remove an element, we would remove it from a[ j], increment j, and 
decrement n. 

Of course, the problem with this solution is that it requires an infinite array. An 
ArrayQueue simulates this by using a finite array a and modular arithmetic. This is the kind 
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of arithmetic used when we are talking about the time of day. For example 10 o'clock plus 
5 hours gives 3 o'clock. Formally, we say that 

10 + 5 = 15 = 3 (mod 12) . 

We read the latter part of this equation as "15 is congruent to 3 modulo 12." We can also 
treat mod as a binary operator, so that 

15 mod 12 = 3. 

More generally for an integer a and positive integer m, a mod m is the unique 
integer r 6 {0, . , ,, m— 1} such that a = r + km for some integer k. Less formally the value r is 
the remainder we get when we divide a by m. In many programming languages, including 
Java, the mod operator is represented using the % symbol. 2 

Modular arithmetic is useful for simulating an infinite array since i mod a. length 
always gives a value in the range 0,..., a. length - 1. Using modular arithmetic we can store 
the queue elements at array locations 

a[j%a.length],a[(j + 1)%a.length],...,a[( j +n- 1)%a.length] . 

This treats a like a circular array in which array indices exceeding a. length - 1 "wrap 

around" to the beginning of the array. 

The only remaining thing to worry about is taking care that the number of elements 

in the ArrayQueue does not exceed the size of a. 
ArrayQueue 



T[] a; 
int j; 
int n; 



A sequence of add(x) and remove() operations on an ArrayQueue is illustrated in 

Figure 2.2. To implement add(x), we first check if a is full and, if necessary, call resizeQ 

to increase the size of a. Next, we store x in a[( j + n)%a. length] and increment n. 

ArrayQueue 



boolean add(T x) { 

if (n + 1 > a. length) resize(); 

a[(j+n) % a. length] = x; 

n++; 

return true; 
} 



This is sometimes referred to as the brain-dead mod operator since it does not correctly implement the 
mathematical mod operator when the first argument is negative. 
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Figure 2.2: A sequence of add(x) and remove(i) operations on an ArrayQueue. Arrows 
denote elements being copied. Operations that result in a call to resize() are marked with 
an asterisk. 
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To implement removeQ we first store a[j] so that we can return it later. Next, 
we decrement n and increment j (modulo a. length) by setting j = (j + 1) mod a. length. 
Finally, we return the stored value of a[j]. If necessary we may call resizeQ to decrease 
the size of a. 
ArrayQueue 



T remove ( ) { 

if (n == 0) throw new NoSuchElementException( ) ; 

T x = a[ j]; 

j = ( j + 1) % a. length; 

n--; 

if (a. length >= 3*n) resize(); 

return x; 
} 



Finally the resizeQ operation is very similar to the resize() operation of Array- 
Stack. It allocates a new array b of size 2n and copies 

a [jL a [(J + 1)*a.length],...,a[(j +n- 1)%a.length] 

onto 

b[0],b[1],...,b[n-1] 

and sets j = 0. 
ArrayQueue 



void resize( ) { 

T[] b = newArray( Math. max ( 1 ,n*2) ) ; 

for (int k = 0; k < n; k++) 
b[k] = a[( j+k) % a. length] ; 

a = b; 

j = 0; 
} 



2.3.1 Summary 

The following theorem summarizes the performance of the ArrayQueue data structure: 

Theorem 2.2. An ArrayQueue implements the (FIFO) Queue interface. Ignoring the cost of 
calls to resizeQ, an ArrayQueue supports the operations add(x) and removeQ in 0(1) time 
per operation. Furthermore, beginning with an empty ArrayQueue, any sequence of m add(i, x) 
and remove(i) operations results in a total ofO(m) time spent during all calls to resizeQ. 
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2.4 ArrayDeque: Fast Deque Operations Using an Array 

The ArrayOueue from the previous section is a data structure for representing a sequence 
that allows us to efficiently add to one end of the sequence and remove from the other end. 
The ArrayDeque data structure allows for efficient addition and removal at both ends. This 
structure implements the List interface using the same circular array technique used to 
represent an ArrayOueue. 
ArrayDeque 



T[] a; 
int j; 

int n; 



The get(i) and set(i, x) operations on an ArrayDeque are straightforward. They 
;et or set the array element a[( j + i) mod a. length]. 

ArrayDeque 



T get(int i) { 

return a[ ( j+i)%a. length] ; 

} 

T set (int i, T x) { 

T y = a[( j+i)%a. length]; 

a[ ( j+i )%a. length] = x; 

return y; 
} 



The implementation of add(i, x) is a little more interesting. As usual, we first check 
if a is full and, if necessary, call resize() to resize a. Remember that we want this operation 
to be fast when i is small (close to 0) or when i is large (close to n). Therefore, we check 
if i < n/2. If so, we shift the elements a[0],...,a[i - 1] left by one position. Otherwise 
(i > n/2), we shift the elements a[i],...,a[n - 1] right by one position. See Figure 2.3 for an 
illustration of add(i, x) and remove(x) operations on an ArrayDeque. 

ArrayDeque 



void add(int i, T x) { 

if (n+1 > a. length) resize(); 

if (i < n/2) { // shift a[0] , . . ,a[i-1 ] left one position 
j = (j == 0) ? a. length - 1 : j - 1; // (j-1) mod a. length 
for (int k = 0; k <= i-1; k++) 

a[ (j+k)%a. length] = a[ ( j+k+1 )%a. length] ; 
} else { // shift a[i] , . . ,a[n-1 ] right one position 
for (int k = n; k > i; k--) 

a[(j+k)%a. length] = a[ ( j+k-1 )%a. length] ; 



35 



2. Array-Based Lists 



2.4. ArrayDeque: Fast Deque Operations Using an Array 



j = 0, n = 8 


a 


b 


c 


d 


e 


f 


g 


h 












\\ 


j = l,n = 7 




a 


b 


d 


e 


f 


g 


h 












\\\ 


j = l,n = 8 




a 


b 


d 


e 


X 


f 


g 


h 










/// 


j = 0, n = 9 


a 


b 


d 


y 


e 


X 


f 


g 


h 










Z~~7^ — — — — 


j = ll, n = 10 


b 


d 


y 


z 


e 


X 


f 


g 


h 






a 



01 2 3456789 10 11 



remove(2) 
add(4,x) 
add(3,y) 
add(4,z) 



Figure 2.3: A sequence of add(i, x) and remove(i) operations on an ArrayDeque. Arrows 
denote elements being copied. 



} 




a[ (j+i)%a. length] : 


-- x; 


n++; 




} 





By doing the shifting in this way, we guarantee that add(i, x) never has to shift more 
than minji, n- i} elements. Thus, the running time of the add(i, x) operation (ignoring the 
cost of a resize() operation) is 0(1 + min{i, n - i}). 

The remove(i) operation is similar. It either shifts elements a[0],...,a[i - 1] right 
by one position or shifts the elements a[i + 1 ], . . . , a[n - 1 ] left by one position depending on 
whether i < n/2. Again, this means that remove(i) never spends more than 0(l+min{i, n- 
i}) time to shift elements. 

ArrayDeque 



T remove(int i) { 

T x = a[ ( j+i)%a. length] ; 

if (i < n/2) { // shift a[0] , . . , [ i-1 ] right one position 
for (int k = i; k > 0; k--) 

a[ (j+k)%a. length] = a[ ( j+k-1 )%a. length] ; 
j = (j + 1) % a. length; 
} else { // shift a[i+1 ] , . . ,a[n- 1 ] left one position 
for (int k = i; k < n-1; k++) 

a[ (j+k)%a. length] = a[ ( j+k+1 )%a. length] ; 

} 
n--; 

if (3*n < a. length) resize(); 
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return x; 
} 



2.4.1 Summary 

The following theorem summarizes the performance of the ArrayDeque data structure: 

Theorem 2.3. An ArrayDeque implements the List interface. Ignoring the cost of calls to 
resizeQ, an ArrayDeque supports the operations 

• get(i) and set(i, x) in O(l) time per operation; and 

• add(i, x) and remove(i) in 0(1 + min{i, n - i}) time per operation. 

Furthermore, beginning with an empty ArrayDeque, any sequence ofm add(i, x) and remove(i) 
operations results in a total of 0(m) time spent during all calls to resizeQ. 

2.5 DualArrayDeque: Building a Deque from Two Stacks 

Next, we present another data structure, the DualArrayDeque that achieves the same per- 
formance bounds as an ArrayDeque by using two ArrayStacks. Although the asymptotic 
performance of the DualArrayDeque is no better than that of the ArrayDeque, it is still 
worth studying since it offers a good example of how to make a sophisticated data struc- 
ture by combining two simpler data structures. 

A DualArrayDeque represents a list using two ArrayStacks. Recall that an Array- 
Stack is fast when the operations on it modify elements near the end. A DualArrayDeque 
places two ArrayStacks, called front and back, back-to-back so that operations are fast at 

either end. 
DualArrayDeque 



List<T> front; 
List<T> back; 



A DualArrayDeque does not explicitly store the number, n, of elements it contains. 

It doesn't need to, since it contains n = front.size() + back.sizeQ elements. Nevertheless, 

when analyzing the DualArrayDeque we will still use n to denote the number of elements 

it contains. 

DualArrayDeque 



int size( ) { 

return front. size() + back.size(); 
} 
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The front ArrayStack contains list elements with indices 0,..., front. size() - 1, 

but stores them in reverse order. The back ArrayStack contains list elements with indices 

front.size(), ...,size() - 1 in the normal order. In this way, get(i) and set(i, x) translate 

into appropriate calls to get(i) or set(i, x) on either front or back, which take 0(1) time 

per operation. 

DualArrayDeque 



T get(int i) { 

if (i < front . size( ) ) { 

return front .get (front .size()-i-1); 
} else { 

return back . get (i-front . size( )) ; 
} 
} 
T set(int i, T x) { 

if (i < front . size( ) ) { 

return front . set (front . size( )-i-1 , x); 

} else { 

return back . set (i-front . size( ) , x); 
} 
} 



Note that, if an index i < front. size(), then it corresponds to the element of front 

at position front. size() - i - 1, since the elements of front are stored in reverse order. 

Adding and removing elements from a DualArrayDeque is illustrated in Figure 2.4. 

The add(i, x) operation manipulates either front or back, as appropriate: 
DualArrayDeque 



void add(int i, T x) { 
if (i < front . size( ) ) { 

front .add(front . size( ) -i, x); 
} else { 

back .add(i-front .size( ) , x); 

} 
balance( ) ; 

} 



The add(i, x) method performs rebalancing of the two ArrayStacks front and 
back, by calling the balanceQ method. The implementation of balance() is described be- 
low, but for now it is sufficient to know that balanceQ ensures that, unless sizeQ < 2, 
front. size() and back.size() do not differ by more than a factor of 3. In particular, 
3 • front. size() > back.sizeQ and 3 • back.sizeQ > front. size(). 
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Figure 2.4: A sequence of add(i, x) and remove(i) operations on a DualArrayDeque. Ar- 
rows denote elements being copied. Operations that result in a rebalancing by balance() 
are marked with an asterisk. 

Next we analyze the cost of add(i, x), ignoring the cost of the balance() operation. 
If i < front. sizeQ, then add(i, x) becomes front. add(front.size() - i - 1, x). Since front 
is an ArrayStack, the cost of this is 



O(front.size()-(front.size()-i -!) + !) = 0(i + l) . 



(2.1) 



On the other hand, if i > front. size(), then add(i, x) becomes back.add(i - front. size(), x) 
The cost of this is 



0(back.size()- (i - front. si ze()) + 1) = 0(n - i + 1) . 



(2.2) 



Notice that the first case (2.1) occurs when i < n/4. The second case (2.2) occurs 
when i > 3n/4. When n/4 < i < 3n/4, we can't be sure whether the operation affects front 
or back, but in either case, the operation takes O(n) = O(i) = 0(n - i) time, since i > n/4 
and n - i > n/4. Summarizing the situation, we have 

0(1 + 1) ifi<n/4 

Running time of add(i, x) < \ O(n) if n/4 < i < 3n/4 

0(l + n-i) ifi>3n/4 

Thus, the running time of add(i, x) (ignoring the cost of the call to balance()) is 0(1 + 
minji, n - i}). 
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The remove(i) operation, and its analysis, is similar to the add(i, x) operation. 

DualArrayDeque 



T remove(int i) { 
T x; 
if (i < front . size( ) ) { 

x = front .remove(front . size( )-i-1 ) ; 
} else { 

x = back . remove (i-front . size( ) ) ; 

} 

balance( ) ; 
return x; 
} 



2.5.1 Balancing 

Finally, we study the balanceQ operation performed by add(i, x) and remove(i). This 
operation is used to ensure that neither front nor back gets too big (or too small). It 
ensures that, unless there are fewer than 2 elements, each of front and back contain at 
least n/4 elements. If this is not the case, then it moves elements between them so that 
front and back contain exactly [n/2J elements and [n/2] elements, respectively. 

DualArrayDeque 



void balance( ) { 
int n = size( ) ; 
if (3*front . size( ) < back.size()) { 

int s = n/2 - front .size( ) ; 

l_ist<T> 11 = newStack(); 

List<T> 12 = newStack(); 

H.addAll(back.subl_ist(0,s)); 

Collect ions. re verse (11 ) ; 

H.addAll( front); 

12.addAll(back. subl_ist(s, back. size( ) ) ) ; 

front =11; 

back = 12; 
} else if (3*back . size( ) < front . size( ) ) { 

int s = front. size() - n/2; 

l_ist<T> 11 = newStack(); 

List<T> 12 = newStack(); 

11 . addAll( front .subl_ist(s, front .size ( ) ) ) ; 

12.addAll(front.subl_ist(0, s)); 

Collections. reverse(12) ; 

12.addAll(back) ; 

front =11; 

back = 12; 
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} 
} 



There is not much to analyze. If the balance() operation does rebalancing, then it 
moves O(n) elements and this takes O(n) time. This is bad, since balance() is called with 
each call to add(i, x) and remove(i). However, the following lemma shows that, on average, 
balance() only spends a constant amount of time per operation. 

Lemma 2.2. If an empty DualArrayDeque is created and any sequence of m > 1 calls to 
add(i, x) and remove(i) are performed, then the total time spent during all calls to balance() 
is O(m). 

Proof. We will show that, if balanceQ is forced to shift elements, then the number of 
add(i, x) and remove(i) operations since the last time balanceQ shifted any elements is at 
least n/2 - 1. As in the proof of Lemma 2.1, this is sufficient to prove that the total time 
spent by balance() is O(m). 

We will perform our analysis using a technique knows as the potential method. De- 
fine the potential, O, of the DualArrayDeque as the difference in size between front and 
back: 

O = |front.size()- back.size()| . 

The interesting thing about this potential is that a call to add(i, x) or remove(i) that does 
not do any balancing can increase the potential by at most 1. 

Observe that, immediately after a call to balanceQ that shifts elements, the poten- 
tial, O , is at most 1, since 

O = |Ln/2j - Tn/211 < 1 . 

Consider the situation immediately before a call to balanceQ that shifts elements 
and suppose, without loss of generality, that balanceQ is shifting elements because 3f ront.sizeQ < 
back.sizeQ. Notice that, in this case, 

n = front. sizeQ + back.sizeQ 

< back. sizeQ/3 + back.sizeQ 

4 
= -back.sizeQ 
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Furthermore, the potential at this point in time is 

01 = back. size() - front. size() 

> back. size() - back. size()/3 

2 
= — back.sizeO 
3 

2 3 

> - x -n 

3 4 

= n/2 

Therefore, the number of calls to add(i, x) or remove(i) since the last time balance() 
shifted elements is at least C^ - O > n/2 - 1. This completes the proof. □ 

2.5.2 Summary 

The following theorem summarizes the performance of a DualArrayStack 

Theorem 2.4. A DualArrayDeque implements the List interface. Ignoring the cost of calls to 
resizeQ and balance(), a DualArrayDeque supports the operations 

• get(i) and set(i, x) in O(l) time per operation; and 

• add(i, x) and remove(i) in 0(1 +min{i, n - i}) time per operation. 

Furthermore, beginning with an empty DualArrayDeque, any sequence of m add(i, x) and 
remove(i) operations results in a total of 0(m) time spent during all calls to resizeQ and 
balanceQ. 

2.6 RootishArrayStack: A Space-Efficient Array Stack 

One of the drawbacks of all previous data structures in this chapter is that, because they 
store their data in one or two arrays, and they avoid resizing these arrays too often, the 
arrays are frequently not very full. For example, immediately after a resize() operation 
on an ArrayStack, the backing array a is only half full. Even worse, there are times when 
only 1/3 of a contains data. 

In this section, we discuss a data structure, the RootishArrayStack, that addresses 
the problem of wasted space. The RootishArrayStack stores n elements using O(Vn) 
arrays. In these arrays, at most 0( V") array locations are unused at any time. All remaining 
array locations are used to store data. Therefore, these data structures waste at most 0(s[n) 
space when storing n elements. 
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Figure 2.5: A sequence of add(i, x) and remove(i) operations on a RootishArrayStack. 
Arrows denote elements being copied. 



A RootishArrayStack stores its elements in a list of r arrays called blocks that are 
numbered 0,1, ...,r- 1. See Figure 2.5. Block b contains b+\ elements. Therefore, all r 
blocks contain a total of 

l+2 + 3 + --- + r = r(r+l)/2 

elements. The above formula can be obtained as shown in Figure 2.6. 
RootishArrayStack 



List<T[ ]> blocks; 
int n; 



The elements of the list are laid out in the blocks as we might expect. The list 
element with index is stored in block 0, the elements with list indices 1 and 2 are stored 
in block 1, the elements with list indices 3, 4, and 5 are stored in block 2, and so on. The 
main problem we have to address is that of determining, given an index i, which block 
contains i as well as the index corresponding to i within that block. 

Determining the index of i within its block turns out to be easy. If index i is in 
block b, then the number of elements in blocks 0,...,b-l is b(b + l)/2. Therefore, i is stored 
at location 

j =i-b(b + l)/2 
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r+1 



Figure 2.6: The number of white squares is 1 + 2 + 3-1 hr. The number of shaded squares 

is the same. Together the white and shaded squares make a rectangle consisting of r(r+ 1) 
squares. 



within block b. Somewhat more challenging is the problem of determining the value of 
b. The number of elements that have indices less than or equal to i is i + 1. On the other 
hand, the number of elements in blocks 0,. . . ,b is (b+ l)(b+2)/2. Therefore, b is the smallest 
integer such that 

(b+l)(b + 2)/2>i + l . 



We can rewrite this equation as 



b 2 + 3b-2i >0 



The corresponding quadratic equation b 2 +3b-2i = has two solutions: b = (-3+V9 + 8i)/2 
and b = (-3 - V9 + 8i)/2. The second solution makes no sense in our application since it 
always gives a negative value. Therefore, we obtain the solution b = (-3 + V9 + 8i)/2. In 
general, this solution is not an integer, but going back to our inequality, we want the small- 
est integer b such that b > (-3 + V9 + 8i)/2. This is simply 

b = [(-3 + V9 + 8i)/2] . 
RootishArrayStack 



int i2b(int i) { 
double db = (-3.0 + Math.sqrt(9 + 8*i)) / 2.0; 
int b = (int)Math.ceil(db); 
return b; 



} 



With this out of the way, the get(i) and set(i, x) methods are straightforward. We 
first compute the appropriate block b and the appropriate index j within the block and 
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then perform the appropriate operation: 

RootishArrayStack 



T get(int i) { 
int b = i2b(i); 
int j = i - b*(b+1)/2; 
return blocks . get (b) [ j ] ; 

} 

T set (int i, T x) { 

int b = i2b(i); 

int j = i - b*(b+1)/2; 

T y = blocks .get (b) [ j ] ; 

blocks . get (b) [ j ] = x; 

return y; 
} 



If we use any of the data structures in this chapter for representing the blocks list, 
then get(i) and set(i, x) will each run in constant time. 

The add(i, x) method will, by now, look familiar. We first check if our data structure 
is full, by checking if the number of blocks r is such that r(r + l)/2 = n and, if so, we call 
growQ to add another block. With this done, we shift elements with indices i, ...,n - 1 to 
the right by one position to make room for the new element with index i: 
RootishArrayStack 



void add(int i, T x) { 

int r = blocks. size( ) ; 

if (r*(r+1)/2 < n + 1) grow(); 

n++; 

for (int j = n-1; j > i; j--) 
set( j, get( j-1) ); 

set(i, x); 
} 



The growQ method does what we expect. It adds a new block: 
RootishArrayStack 



void grow( ) { 

blocks . add (newArray (blocks .size()+1)); 
} 



Ignoring the cost of the grow() operation, the cost of an add(i, x) operation is dom- 
inated by the cost of shifting and is therefore 0(1 + n - i), just like an ArrayStack. 
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The remove(i) operation is similar to add(i, x). It shifts the elements with indices 
i + 1, . . . , n left by one position and then, if there is more than one empty block, it calls the 
shrink() method to remove all but one of the unused blocks: 

RootishArrayStack 



T remove (int i) { 

T x = get(i); 

for (int j = i; j < n-1; j++) 
set(j, get(j+1)); 

n--; 

int r = blocks. size( ) ; 

if ((r-2)*(r-1)/2 >= n) shrink(); 

return x; 
} 



RootishArrayStack 

void shrink( ) { 

int r = blocks .size( ) ; 
while (r > && (r-2)*(r-1 )/2 >= n) { 
blocks .remove (blocks . size( )-1 ) ; 
r--; 
} 
} 



Once again, ignoring the cost of the shrinkQ operation, the cost of a remove(i) 
operation is dominated by the cost of shifting and is therefore 0(n - i). 

2.6.1 Analysis of Growing and Shrinking 

The above analysis of add(i, x) and remove(i) does not account for the cost of grow() and 
shrinkQ. Note that, unlike the ArrayStack.resizeQ operation, grow() and shrinkQ do not 
do any copying of data. They only allocate or free an array of size r. In some environments, 
this takes only constant time, while in others, it may require time proportional to r. 

We note that, immediately after a call to grow() or shrinkQ, the situation is clear. 
The final block is completely empty and all other blocks are completely full. Another 
call to grow() or shrinkQ will not happen until at least r- 1 elements have been added or 
removed. Therefore, even if growQ and shrinkQ take O(r) time, this cost can be amortized 
over at least r-1 add(i, x) or remove(i) operations, so that the amortized cost of growQ and 
shrinkQ is 0(1) per operation. 



46 



2. Array-Based Lists 2.6. RootishArrayStack: A Space-Efficient Array Stack 

2.6.2 Space Usage 

Next, we analyze the amount of extra space used by a RootishArrayStack. In particular, 
we want to count any space used by a RootishArrayStack that is not an array element 
currently used to hold a list element. We call all such space wasted space. 

The remove(i) operation ensures that a RootishArrayStack never has more than 
2 blocks that are not completely full. The number of blocks, r, used by a RootishArray- 
Stack that stores n elements therefore satisfies 

(r-2)(r-l)<n . 

Again, using the quadratic equation on this gives 

r<(3 + Vl+4n)/2 = 0(Vn) . 

The last two blocks have sizes r and r- 1, so the space wasted by these two blocks is at 
most 2r-l = 0(y/n). If we store the blocks in (for example) an ArrayList, then the amount 
of space wasted by the List that stores those r blocks is also O(r) = 0( V"). The other space 
needed for storing n and other accounting information is 0(1). Therefore, the total amount 
of wasted space in a RootishArrayStack is 0(y/n). 

Next, we argue that this space usage is optimal for any data structure that starts 
out empty and can support the addition of one item at a time. More precisely, we will 
show that, at some point during the addition of n items, the data structure is wasting an 
amount of space at least in ^/n (though it may be only wasted for a moment). 

Suppose we start with an empty data structure and we add n items one at a time. 
At the end of this process, all n items are stored in the structure and they are distributed 
among a collection of r memory blocks. If r > Vn, then the data structure must be using 
r pointers (or references) to keep track of these r blocks, and this is wasted space. On the 
other hand, if r < V" then, by the pigeonhole principle, some block must have size at least 
n/r > y/n. Consider the moment at which this block was first allocated. Immediately after 
it was allocated, this block was empty and was therefore wasting Vn space. Therefore, at 
some point in time during the insertion of n elements, the data structure was wasting Vn 
space. 

2.6.3 Summary 

The following theorem summarizes the performance of the RootishArrayStack data struc- 
ture: 
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Theorem 2.5. A RootishArrayStack implements the List interface. Ignoring the cost of calls 
to grow() and shrinkQ, a RootishArrayStack supports the operations 

• get(i) and set(i, x) in O(l) time per operation; and 

• add(i, x) and remove(i) in 0(1 + n - i) time per operation. 

Furthermore, beginning with an empty RootishArrayStack, any sequence of m add(i, x) and 
remove(i) operations results in a total of 0(m) time spent during all calls to grow() and shrink(). 

The space (measured in words) 3 used by a RootishArrayStack that stores n elements 
is n + O(Vn). 

2.6.4 Computing Square Roots 

A reader who has had some exposure to models of computation may notice that the Rootish- 
ArrayStack, as described above, does not fit into the usual word-RAM model of compu- 
tation (Section 1.3) because it requires taking square roots. The square root operation is 
generally not considered a basic operation and is therefore not usually part of the word- 
RAM model. 

In this section, we take time to show that the square root operation can be imple- 
mented efficiently. In particular, we show that for any integer x 6 {0, ...,n}, [V^J can be 
computed in constant-time, after 0(V") preprocessing that creates two arrays of length 
O(Vn). The following lemma shows that we can reduce the problem of computing the 
square root of x to the square root of a related value x'. 

Lemma 2.3. Let x > 1 and let x' = x - a, where < a < -\fx. Then Vx 7 > V* ~ 1- 

Proof. It suffices to show that 

Jx - V* > Vx - 1 • 

Square both sides of this inequality to get 

x - V* > x - 2yfx + 1 

and gather terms to get 

Vx>l 

which is clearly true for any x > 1. □ 



Recall Section 1.3 for a discussion of how memory is measured. 
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Start by restricting the problem a little, and assume that 2 r < x < 2 r+1 , so that 
[logxj = r, i.e., x is an integer having r + 1 bits in its binary representation. We can take 
x' = x - (x mod 2L r/2 J ). Now, x' satisfies the conditions of Lemma 2.3, so V* ~ Vx 7 < 1. 
Furthermore, x' has all of its lower-order |_r/2J bits equal to 0, so there are only 

2 r+l-Lr/2j< 4 . 2 r/2< 4 ^ 

possible values of x'. This means that we can use an array sqrttab, that stores the value 
of [Vx'J for each possible value of x'. A little more precisely, we have 

sqrttab[z] = [Vz'2Lr/2jJ _ 

In this way sqrttab[z] is within 2 of Vx for all x 6 {i2^- r/2 ^...,(i+l)2^- r/2 ^-l}. Stated another 
way the array entry s = sqrttab[x>>[r/2J] is either equal to LV*J> l_V*J ~~ 1> or LV*J ~~ 2. 
From s we can determine the value of |_V*J by incrementing s until (s + l) 2 > x. 

— FastSqrt 



int sqrt(int x, int r) { 

int s = sqrtab[x>>r/2] ; 

while ((s+1)*(s+1) <= x) s++; // executes at most twice 

return s; 
} 



Now, this only works for x 6 {2 r , ...,2 r+1 - 1} and sqrttab is a special table that 
only works for a particular value of r = [log xj. To overcome this, we could compute [log nj 
different sqrttab arrays, one for each possible value of [logxj. The sizes of these tables 
form an exponential sequence whose largest value is at most 4Vn, so the total size of all 
tables is O(yjri). 

However, it turns out that more than one sqrttab array is unnecessary; we only 
need one sqrttab array for the value r = [lognj. Any value x with logx = r' < r can be 
upgraded by multiplying x by 2 r ~ r and using the equation 

V2^'x = 2< r - r '> /2 Vx" . 

The quantity 2 r ~ r x is in the range {2 r , ...,2 r+1 - 1} so we can look up its square root in 

sqrttab. The following code implements this idea to compute LV^J for all non-negative 

integers x in the range {0, ...,2 30 - 1} using an array, sqrttab, of size 2 16 . 

FastSqrt 



int sqrt(int x) { 
int rp = log(x) ; 
int upgrade = ((r-rp)/2) * 2; 
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int xp = x << upgrade; // xp has r or r-1 bits 
int s = sqrtab[xp>>(r/2) ] >> (upgrade/2); 
while ((s+1)*(s+1) <= x) s++; // executes at most twice 
return s; 
} 



Something we have taken for granted thus far is the question of how to compute 
r' = [logxj. Again, this is a problem that can be solved with an array, logtab, of size 
2 r/2 . In this case, the code is particularly simple, since [log xj is just the index of the 
most significant 1 bit in the binary representation of x. This means that, for x > 2 r/2 , we 
can right-shift the bits of x by r/2 positions before using it as an index into logtab. The 
following code does this using an array logtab of size 2 16 to compute [log xj for all x in 
the range {1,...,2 32 - 1} 
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Finally, for completeness, we include the following code that initializes logtab and 

sqrttab: 
FastSqrt 



void inittabs( ) { 

sqrtab = new int [ 1<<(r/2) ] ; 
logtab = new int [ 1<<(r/2) ] ; 
for (int d = 0; d < r/2; d++) 

Arrays. fill(logtab, 1<<d, 2<<d, d); 
int s = 1«(r/4); // sqrt (2" (r/2) ) 

for (int i = 0; i < 1«(r/2); i++) { 

if ((s+1)*(s+1) <= i << (r/2)) s++; // sqrt increases 

sqrtab[i] = s; 
} 
} 



To summarize, the computations done by the i2b(i) method can be implemented 
in constant time on the word-RAM using 0(y/n) extra memory to store the sqrttab and 
logtab arrays. These arrays can be rebuilt when n increases or decreases by a factor of 2, 
and the cost of this rebuilding can be amortized over the number of add(i, x) and remove(i) 
operations that caused the change in n in the same way that the cost of resizeQ is analyzed 
in the ArrayStack implementation. 
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2.7 Discussion and Exercises 

Most of the data structures described in this chapter are folklore. They can be found in im- 
plementations dating back over 30 years. For example, implementations of stacks, queues, 
and deques which generalize easily to the ArrayStack, ArrayQueue and ArrayDeque struc- 
tures described here are discussed by Knuth [44, Section 2.2.2]. 

Brodnik et al. [12] seem to have been the first to describe the RootishArrayStack 
and prove a yfn lower-bound like that in Section 2.6.2. They also present a different struc- 
ture that uses a more sophisticated choice of block sizes in order to avoid computing square 
roots in the 12b(i) method. With their scheme, the block containing i is block Llog(i + 1)J, 
which is just the index of the leading 1 bit in the binary representation of i + 1. Some com- 
puter architectures provide an instruction for computing the index of the leading 1-bit in 
an integer. 

A structure related to the RootishArrayStack is the 2-level tiered-vector of Goodrich 
and Kloss [33]. This structure supports get(i, x) and set(i, x) in constant time and add(i, x) 
and remove(i) in O(Vn) time. These running times are similar to what can be achieved 
with the more careful implementation of a RootishArrayStack discussed in Exercise 2.10. 

Exercise 2.1. In the ArrayStack implementation, after the first call to remove(i), the back- 
ing array, a, contains n + 1 non-null values even though the ArrayStack only contains n 
elements. Where is the extra non-null value? Discuss any consequences this might have 
on the Java Runtime Environment's memory manager. 

Exercise 2.2. The List method addAll(i, c) inserts all elements of the Collection c into 
the list at position i. (The add(i, x) method is a special case where c = {x}.) Explain why, for 
the data structures in this chapter, it is not efficient to implement addAll(i, c) by repeated 
calls to add(i, x). Design and implement a more efficient implementation. 

Exercise 2.3. Design and implement a RandomQueue. This is an implementation of the 
Queue interface in which the remove() operation removes an element that is chosen uni- 
formly at random among all the elements currently in the queue. (Think of a RandomQueue 
as a bag in which we can add elements or reach in and blindly remove some random ele- 
ment.) The add(x) and remove() operations in a RandomQueue should run in constant time 
per operation. 

Exercise 2.4. Design and implement a Treque (triple-ended queue). This is a List imple- 
mentation in which get(i) and set(i, x) run in constant time and add(i, x) and remove(i) 
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run in time 

0(1 +min{i,n-i,|n/2-i|}) . 

With this running time, modifications are fast if they are near either end or near the middle 
of the list. 

Exercise 2.5. Implement a method rotate(r) that "rotates" a List so that list item i be- 
comes list item (i+r) mod n. When run on an ArrayDeque, or a DualArrayDeque, rotate(r) 
should run in 0(1 + minjr, n - r}) time. 

Exercise 2.6. Modify the ArrayDeque implementation so that the shifting done by add(i, x), 
remove(i), and resize() is done using System. arraycopy(s, i,d, j,n). 

Exercise 2.7. Modify the ArrayDeque implementation so that it does not use the % operator 
(which is expensive on some systems). Instead, it should make use of the fact that, if 
a. length is a power of 2, then k%a. length=k&(a. length - 1). (Here, & is the bitwise-and 
operator.) 

Exercise 2.8. Design and implement a variant of ArrayDeque that does not do any modu- 
lar arithmetic at all. Instead, all the data sits in a consecutive block, in order, inside an 
array. When the data overruns the beginning or the end of this array, a modified rebuild() 
operation is performed. The amortized cost of all operations should be the same as in an 
ArrayDeque. 

Hint: Making this work is really all about how a rebuild() operation is performed. 
You would like rebuild() to put the data structure into a state where the data cannot run 
off either end until at least n/2 operations have been performed. 

Test the performance of your implementation against the ArrayDeque. Optimize 
your implementation (by using System.arraycopy(a, i, b, i, n)) and see if you can get it to 
outperform the ArrayDeque implementation. 

Exercise 2.9. Design and implement a version of a RootishArxayStack that has only O(Vn) 
wasted space, but that can perform add(i, x) and remove(i, x) operations in 0(1 +min{i, n- 
i}) time. 

Exercise 2.10. Design and implement a version of a RootishArrayStack that has only 
O(Vn) wasted space, but that can perform add(i, x) and remove(i, x) operations in 0(1 + 
minji/n, n - i}) time. (For an idea on how to do this, see Section 3.3.) 

Exercise 2.11. Design and implement a version of a RootishArrayStack that has only 
O(Vn) wasted space, but that can perform add(i, x) and remove(i, x) operations in 0(1 + 
minji, Vn, n - i}) time. (See Section 3.3 for ideas on how to achieve this.) 
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Exercise 2.12. Design and implement a CubishArrayStack. This three level structure im- 
plements the List interface using at most 0(n 2/3 ) wasted space. In this structure, get(i) 
and set(i, x) take constant time; while add(i, x) and remove(i) take 0(n 1/3 ) amortized 
time. 
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Chapter 3 
Linked Lists 



In this chapter, we continue to study implementations of the List interface, this time 
using pointer-based data structures rather than arrays. The structures in this chapter are 
made up of nodes that contain the list items. The nodes are linked together into a sequence 
using references (pointers). We first study singly-linked lists, which can implement Stack 
and (FIFO) Queue operations in constant time per operation. 

Linked lists have advantages and disadvantages relative to array-based implemen- 
tations of the List interface. The primary disadvantage is that we lose the ability to access 
any element using get(i) or set(i, x) in constant time. Instead, we have to walk through 
the list, one element at a time, until we reach the ith element. The primary advantage is 
that they are more dynamic: Given a reference to any list node u, we can delete u or insert 
a node adjacent to u in constant time. This is true no matter where u is in the list. 

3.1 SLList: A Singly-Linked List 

An SLList (singly-linked list) is a sequence of Nodes. Each node u stores a data value 
u.x and a reference u.next to the next node in the sequence. For the last node w in the 
sequence, w.next = null 

SLList 



class Node { 

T x; 

Node next; 
} 



For efficiency, an SLList uses variables head and tail to keep track of the first 
and last node in the sequence, as well as an integer n to keep track of the length of the 
sequence: 



55 



3. Linked Lists 



3.1. SLList: A Singly-Linked List 
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Figure 3.1: A sequence of Queue (add(x) and remove()) and Stack (push(x) and pop()) op- 
erations on an SLList. 



SLList 



Node head; 
Node tail; 
int n; 



A sequence of Stack and Queue operations on an SLList is illustrated in Figure 3.1. 

An SLList can efficiently implement the Stack operations push() and pop() by 
adding and removing elements at the head of the sequence. The push() operation sim- 
ply creates a new node u with data value x, sets u.next to the old head of the list and 
makes u the new head of the list. Finally it increments n since the size of the SLList has 
increased by one: 

SLList 



T 


push(T 


x 


) { 






Node u 


= 


new Node 


0; 




u . x = x; 








u. next 


= 


head; 






head = 


u 


> 






il (n = 


-= 


0) 






tail 


= 


u; 






n++; 










return 


X 


; 




} 











The pop() operation, after checking that the SLList is not empty removes the head 
by setting head = head. next and decrementing n. A special case occurs when the last 
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if (n == 0) return 
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Clearly, both the push(x) and pop() operations run in 0(1) time. 
3.1.1 Queue Operations 

An SLList can also efficiently implement the FIFO queue operations add(x) and removeQ. 

Removals are done from the head of the list, and are identical to the pop() operation: 

SLList 
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Additions, on the other hand, are done at the tail of the list. In most cases, this is 
done by setting tail. next = u, where u is the newly created node that contains x. However, 
a special case occurs when n = 0, in which case tail = head = null. In this case, both tail 
and head are set to u. 







SI 1 i St 


boolean add(T 


x) { 




Node u = new Node(); 




u . x = x; 






if (n == 0) 


{ 




head = u; 






} else { 






tail .next 


= u; 




} 






tail = u; 






n++; 






return true; 






} 







Clearly, both add(x) and remove() take constant time. 
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3.1.2 Summary 

The following theorem summarizes the performance of an SLList: 

Theorem 3.1. An SLList implements the Stack and (FIFO) Queue interfaces. The push(x), 
pop(), add(x) and remove() operations run in 0(1) time per operation. 

An SLList comes very close to implementing the full set of Deque operations. The 
only missing operation is removal from the tail of an SLList. Removing from the tail of 
an SLList is difficult because it requires updating the value of tail so that it points to 
the node w that precedes tail in the SLList; this is the node w such that w.next = tail. 
Unfortunately the only way to get to w is by traversing the SLList starting at head and 
taking n - 2 steps. 

3.2 DLList: A Doubly-Linked List 

A DLList (doubly-linked list) is very similar to an SLList except that each node u in a 
DLList has references to both the node u.next that follows it and the node u.prev that 
precedes it. 

DLList 



c 


Lass 
T x; 


Node 


{ 






Node 


prev 




next; 


} 











When implementing an SLList, we saw that there were always some special cases 
to worry about. For example, removing the last element from an SLList or adding an 
element to an empty SLList requires special care so that head and tail are correctly 
updated. In a DLList, the number of these special cases increases considerably. Perhaps 
the cleanest way to take care of all these special cases in a DLList is to introduce a dummy 
node. This is a node that does not contain any data, but acts as a placeholder so that there 
are no special nodes; every node has both a next and a prev, with dummy acting as the node 
that follows the last node in the list and that precedes the first node in the list. In this way, 
the nodes of the list are (doubly-)linked into a cycle, as illustrated in Figure 3.2. 
DLList 



int n; 
Node dummy; 
DLList() { 

dummy = new Node(); 

dummy. next = dummy; 
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Figure 3.2: A DLList containing a,b,c,d,e. 



dummy . prev = dummy; 
n = 0; 



Finding the node with a particular index in a DLList is easy; we can either start at 
the head of the list (dummy. next) and work forward, or start at the tail of the list (dummy. prev) 
and work backward. This allows us to reach the ith node in 0(1 + min{i, n — i}) time: 

DLList 



Node getNode(int i) 


{ 




Node p = null; 






if (i < n / 2) { 






p = dummy. next; 






for (int j = 0; 


j 


< i; j++) 


p = p. next; 






} else { 






p = dummy; 






for (int j = n; 


j 


> i; j--) 


p = p. prev; 






} 






return (p) ; 






} 







The get(i) and set(i, x) operations are now also easy. We first find the ith node 
and then get or set its x value: 
DLList 



T get(int i) { 

return getNode(i) . x; 

} 

T set(int i, T x) { 

Node u = getNode(i); 

T y = u.x; 

u . x = x; 
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u.prev 




Figure 3.3: Adding the node u before the node w in a DLList. 



return y; 



The running time of these operations is dominated by the time it takes to find the 
ith node, and is therefore 0(1 + min{i,n - i}). 

3.2.1 Adding and Removing 

If we have a reference to a node w in a DLList and we want to insert a node u before w, then 
this is just a matter of setting u.next = w, u.prev = w.prev, and then adjusting u.prev. next 
and u.next.prev. (See Figure 3.3.) Thanks to the dummy node, there is no need to worry 
about w.prev or w.next not existing. 

DLList 



N 


3de addBefore 


(Node 


w, 


T x) 


{ 




Node u 


= new 


Node 


0; 








u . x = x; 












u. prev 


= w. p 


rev; 










u.next 


= w; 












u. next 


prev 


= u; 










u.prev 


next 


= u; 










n++; 














return 


u; 










} 















Now, the list operation add(i, x) is trivial to implement. We find the ith node in 
the DLList and insert a new node u that contains x just before it. 

DLList 



void add(int i, T x) { 

addBefore(getNode(i) , x); 
} 



60 



3. Linked Lists 3.2. DLList: A Doubly-Linked List 

The only non-constant part of the running time of add(i, x) is the time it takes to 
find the ith node (using getNode(i)). Thus, add(i, x) runs in 0(1 + minji, n - i}) time. 

Removing a node w from a DLList is easy. We need only adjust pointers at w.next 
and w.prev so that they skip over w. Again, the use of the dummy node eliminates the need 
to consider any special cases: 
DLList 



void remove (Node w) { 
w.prev. next = w.next; 
w.next.prev = w.prev; 
n--; 

} 



Now the remove(i) operation is trivial. We find the node with index i and remove 
it: 

DLList 



T 


remove 


(int 


i) 


{ 






Node w 


= ge 


tNode( 


i); 




remove 


(w); 










return 


w. x; 








} 













Again, the only expensive part of this operation is finding the ith node using 
getNode(i), so remove(i) runs in 0(1 + minji, n - i}) time. 

3.2.2 Summary 

The following theorem summarizes the performance of a DLList: 

Theorem 3.2. A DLList implements the List interface. The get(i), set(i, x), add(i, x) and 
remove(i) operations run in 0(1 + minji, n - i}) time per operation. 

It is worth noting that, if we ignore the cost of the getNode(i) operation, then all 
operations on a DLList take constant time. Thus, the only expensive part of operations on 
a DLList is finding the relevant node. Once we have the relevant node, adding, removing, 
or accessing the data at that node takes only constant time. 

This is in sharp contrast to the array-based List implementations of Chapter 2; in 
those implementations, the relevant array item can be found in constant time. However, 
addition or removal requires shifting elements in the array and, in general, takes non- 
constant time. 
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For this reason, linked list structures are well-suited to applications where refer- 
ences to list nodes can be obtained through external means. An example of this is the 
LinkedHashSet data structure found in the Java Collections Framework, in which a set of 
items is stored in a doubly-linked list and the nodes of the doubly-linked list are stored in 
a hash table (discussed in Chapter 5). When elements are removed from a LinkedHashSet, 
the hash table is used to find the relevant list node in constant time and then the list node 
is deleted (also in constant time). 

3.3 SEList: A Space-Efficient Linked List 

One of the drawbacks of linked lists (besides the time it takes to access elements that are 
deep within the list) is their space usage. Each node in a DLList requires an additional 
two references to the next and previous nodes in the list. Two of the fields in a Node are 
dedicated to maintaining the list and only one of the fields is for storing data! 

An SEList (space-efficient list) reduces this wasted space using a simple idea: 
Rather than store individual elements in a DLList, we store a block (array) containing 
several items. More precisely, an SEList is parameterized by a block size b. Each individ- 
ual node in an SEList stores a block that can hold up to b + 1 elements. 

It will turn out, for reasons that become clear later, that it will be helpful if we 

can do Deque operations on each block. The data structure we choose for this is a BDeque 

(bounded deque), derived from the ArrayDeque structure described in Section 2.4. The 

BDeque differs from the ArrayDeque in one small way: When a new BDeque is created, the 

size of the backing array a is fixed at b + 1 and it never grows or shrinks. The important 

property of a BDeque is that it allows for the addition or removal of elements at either the 

front or back in constant time. This will be useful as elements are shifted from one block 

to another. 
SEList 



class BDeque extends ArrayDeque<T> { 
BDeque () { 

super (SEList . this . type( ) ) ; 
a = newArray (b+1 ) ; 

} 

void resize( ) { } 

} 



An SEList is then a doubly-linked list of blocks: 
SEList 



class Node { 
BDeque d; 
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Node prev, next; 



SEList 



int n; 
Node dummy; 



3.3.1 Space Requirements 

An SEList places very tight restrictions on the number of elements in a block: Unless a 
block is the last block, then that block contains at least b - 1 and at most b + 1 elements. 
This means that, if an SEList contains n elements, then it has at most 

n/(b-l) + l = 0(n/b) 

blocks. The BDeque for each block contains an array of length b + 1 but, for all blocks 
except the last, at most a constant amount of space is wasted in this array. The remaining 
memory used by a block is also constant. This means that the wasted space in an SEList 
is only 0(b + n/b). By choosing a value of b within a constant factor of V" we can make the 
space-overhead of an SEList approach the V" lower bound given in Section 2.6.2. 

3.3.2 Finding Elements 

The first challenge we face with an SEList is finding the list item with a given index i. 
Note that the location of an element consists of two parts: The node u that contains the 
block that contains the element as well as the index j of the element within its block. 

SEList 



class Location { 




Node u; 




int j; 




Location(Node u, 


int j) { 


this.u = u; 




this, j = j; 




} 




} 





To find the block that contains a particular element, we proceed in the same way 
as in a DLList. We either start at the front of the list and traverse in the forward direction 
or at the back of the list and traverse backwards until we reach the node we want. The 
only difference is that, each time we move from one node to the next, we skip over a whole 
block of elements. 
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SEList 



Location getl_ocation(int i) { 
if (i < n/2) { 

Node u = dummy . next ; 
while (i >= u.d.size()) { 
i -= u . d. size( ) ; 
u = u.next; 

} 

return new Location(u, i); 
} else { 

Node u = dummy; 
int idx = n; 
while (i < idx) { 

u = u.prev; 

idx -= u . d. size( ) ; 

} 

return new l_ocation(u, i-idx); 

} 
} 



Remember that, with the exception of at most one block, each block contains at 
least b - 1 elements, so each step in our search gets us b - 1 elements closer to the element 
we are looking for. If we are searching forward, this means we reach the node we want after 
0(1 + i/b) steps. If we search backwards, we reach the node we want after 0(1 + (n - i)/b) 
steps. The algorithm takes the smaller of these two quantities depending on the value of 
i, so the time to locate the item with index i is 0(1 + min{i, n - i}/b). 

Once we know how to locate the item with index i, the get(i) and set(i, x) opera- 
tions translate into getting or setting a particular index in the correct block: 
SEList 



T get(int i) { 

Location 1 = getLocation(i) ; 
return 1. u . d. get (1. j ) ; 

} 

T set (int i, T x) { 

Location 1 = getLocation(i) ; 

T y = l.u.d.get(l.j); 

l.u.d. set (1. j ,x) ; 

return y; 
} 



The running times of these operations are dominated by the time it takes to locate 
the item, so they also run in 0(1 + minji, n - i}/b) time. 
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3.3.3 Adding an Element 

Things start to get complicated when adding elements to an SEList. Before considering 
the general case, we consider the easier operation, add(x), in which x is added to the end 
of the list. If the last block is full (or does not exist because there are no blocks yet), then 
we first allocate a new block and append it to the list of blocks. Now that we are sure that 
the last block exists and is not full, we append x to the last block. 

SEList 



boolean add(T x) { 




Node last = dummy. prev; 




if (last == dummy | [ last . d. size( ) = 


= b+1) { 


last = addBefore ( dummy ) ; 




} 




last .d.add(x) ; 




n++; 




return true; 




} 





Things get more complicated when we add to the interior of the list using add(i, x). 
We first locate i to get the node u whose block contains the ith list item. The problem is 
that we want to insert x into u's block, but we have to be prepared for the case where u's 
block already contains b + 1 elements, so that it is full and there is no room for x. 

Let U0,u 1 ,u 2 ,... denote u, u.next, u. next. next, and so on. We explore Uq,u 1 ,u 2 ,... 
looking for a node that can provide space for x. Three cases can occur during our space 
exploration (see Figure 3.4): 

1. We quickly (in r + 1 < b steps) find a node u r whose block is not full. In this case, we 
perform r shifts of an element from one block into the next, so that the free space in 
u r becomes a free space in u . We can then insert x into u 's block. 

2. We quickly (in r+1 < b steps) run off the end of the list of blocks. In this case, we add 
a new empty block to the end of the list of blocks and proceed as in the first case. 

3. After b steps we do not find any block that is not full. In this case, uo,...,u b _! is a 
sequence of b blocks that each contain b+1 elements. We insert a new block u b at 
the end of this sequence and spread the original b(b + 1) elements so that each block 
of u ,...,u b contains exactly b elements. Now u 's block contains only b elements so 
it has room for us to insert x. 
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a x b c ■*-» d e f g *-*■ h i j 



abed <-*■ e f g h 




a x b c •*-*■ d e f e ■*-*• h 



abed "•"-»■ e f g h +-*> i j k 1 




a x b c +-*■ d e f +-*> g h i >*-» j k 1 



Figure 3.4: The three cases that occur during the addition of an item x in the interior of an 
SEList. (This SEList has block size b = 3.) 



SEList 



void add(int i, T x) { 
if (i == n) { 
add( x) ; 
return; 

} 

Location 1 = getLocation(i) ; 

Node u = l.u; 

int r = 0; 

while (r < b && u != dummy && u.d.size() == b+1) { 

u = u.next; 

r++; 

} 

if (r == b) { // found b blocks each with b+1 elements 

spread(l. u) ; 

u = l.u; 

} 

if (u == dummy) { // ran off the end of the list - add new node 
u = addBefore(u) ; 

} 

while (u != l.u) { // work backwards, shifting an element at each step 

u.d.add(0, u.prev. d.remove(u.prev.d.size( )-1 ) ) ; 

u = u.prev; 
} 
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u.d. 


add(l 


], 


x); 




n++, 








} 











The running time of the add(i, x) operation depends on which of the three cases 
above occurs. Cases 1 and 2 involve examining and shifting elements through at most b 
blocks and take 0(b) time. Case 3 involves calling the spread(u) method, which moves 
b(b + 1) elements and takes 0(b 2 ) time. If we ignore the cost of Case 3 (which we will 
account for later with amortization) this means that the total running time to locate i and 
perform the insertion of x is 0(b + minji, n - i}/b). 

3.3.4 Removing an Element 

Removing an element, using the remove(i) method from an SEList is similar to adding an 
element. We first locate the node u that contains the element with index i. Now, we have 
to be prepared for the case where we cannot remove an element from u without causing 
u's block to have size less than b - 1, which is not allowed. 

Again, let uq, Ui, U2,... denote u, u.next, u. next. next, We examine uq, ^1,^2, ■■■ in 
order looking for a node from which we can borrow an element to make the size of uq's 
block larger than b - 1. There are three cases to consider (see Figure 3.5): 

1. We quickly (in r + 1 < b steps) find a node whose block contains more than b - 1 
elements. In this case, we perform r shifts of an element from one block into the 
previous, so that the extra element in u,. becomes an extra element in u . We can 
then remove the appropriate element from uo's block. 

2. We quickly (in r + 1 < b steps) run off the end of the list of blocks. In this case, u r 
is the last block, and there is no requirement that u r 's block contain at least b - 1 
elements. Therefore, we proceed as above, borrowing an element from u r to make an 
extra element in uq. If this causes u/s block to become empty, then we remove it. 

3. After b steps we do not find any block containing more than b - 1 elements. In 

this case, u ,...,u b _i is a sequence of b blocks that each contain b - 1 elements. We 

gather these b(b-l) elements into u , ...,u b _ 2 so that each of these b-1 blocks contains 

exactly b elements and we remove u b _ 1 , which is now empty. Now u 's block contains 

b elements so we can remove the appropriate element from it. 
SEList 



T remove(int i) { 

Location 1 = getLocation(i) ; 
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Figure 3.5: The three cases that occur during the removal of an item x in the interior of an 
SEList. (This SEList has block size b = 3.) 



Ty= l.u.d.get(l.j); 

Node u = l.u; 

int r = 0; 

while (r < b && u != dummy && u.d.size() == b-1) { 

u = u.next; 

r++; 

} 

if (r == b) { // found b blocks each with b-1 elements 
gather(l.u) ; 

} 

u = l.u; 

u . d.remove(l. j ) ; 

while (u.d.size() < b-1 && u.next != dummy) { 

u.d.add(u.next. d.remove(O) ) ; 

u = u . next; 

} 

if (u . d. isEmpty ( ) ) remove(u); 

n--; 

return y; 



} 



Like the add(i, x) operation, the running time of the remove(i) operation is 0(b 
minji, n - i}/b) if we ignore the cost of the gather(u) method that occurs in Case 3. 
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3.3.5 Amortized Analysis of Spreading and Gathering 

Next, we consider the cost of the gather(u) and spread(u) methods that may be executed 
by the add(i, x) and remove(i) methods. For completeness, here they are: 
SEList 



void spread(Node u) { 
Node w = u; 

for (int j = 0; j < b; j++) { 
w = w.next; 

} 

w = addBefore(w) ; 

while (w != u) { 

while (w.d.size() < b) 

w.d.add(0,w.prev. d.remove(w.prev. d.size()-1)); 
w = w.prev; 
} 
} 



SEList 



void gather(Node u) { 
Node w = u; 

for (int j = 0; j < b-1; j++) { 
while (w.d.size() < b) 

w.d.add(w.next. d.remove(O) ) ; 
w = w.next; 

} 
remove(w) ; 

} 



The running time of each of these methods is dominated by the two nested loops. 
Both the inner loop and outer loop execute at most b + 1 times, so the total running time 
of each of these methods is 0((b + l) 2 ) = 0(b 2 ). However, the following lemma shows that 
these methods execute on at most one out of every b calls to add(i, x) or remove(i). 

Lemma 3.1. If an empty SEList is created and any sequence of m > 1 calls to add(i, x) and 
remove(i) are performed, then the total time spent during all calls to spread() and gather() is 
O(bm). 

Proof. We will use the potential method of amortized analysis. We say that a node u is 
fragile if u's block does not contain b elements (so that u is either the last node, or contains 
b - 1 or b + 1 elements). Any node whose block contains b elements is rugged. Define the 
potential of an SEList as the number of fragile nodes it contains. We will consider only 
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the add(i, x) operation and its relation to the number of calls to spread(u). The analysis of 
remove(i) and gather(u) is identical. 

Notice that, if Case 1 occurs during the add(i, x) method, then only one node, u r 
has the size of its block changed. Therefore, at most one node, namely u r , goes from being 
rugged to being fragile. If Case 2 occurs, then a new node is created, and this node is 
fragile, but no other node changes sizes, so the number of fragile nodes increases by one. 
Thus, in either Case 1 or Case 2 the potential of the SEList increases by at most 1. 

Finally, if Case 3 occurs, it is because u , . . . , u b _ 1 are all fragile nodes. Then spread(«o) 
is called and these b fragile nodes are replaced with b + 1 rugged nodes. Finally, x is added 
to uq's block, making u fragile. In total the potential decreases by b - 1. 

In summary, the potential starts at (there are no nodes in the list). Each time 
Case 1 or Case 2 occurs, the potential increases by at most 1. Each time Case 3 occurs, the 
potential decreases by b - 1. The potential (which counts the number of fragile nodes) is 
never less than 0. We conclude that, for every occurrence of Case 3, there are at least b - 1 
occurrences of Case 1 or Case 2. Thus, for every call to spread(u) there are at least b calls 
to add(i, x). This completes the proof. □ 

3.3.6 Summary 

The following theorem summarizes the performance of the SEList data structure: 

Theorem 3.3. An SEList implements the List interface. Ignoring the cost of calls to spread(u) 
and gather(u), an SEList with block size b supports the operations 

• get(i) and set(i, x) in 0(1 + min{i,n - i}/b) time per operation; and 

• add(i, x) and remove(i) in 0(b + minji, n - i}/b) time per operation. 

Furthermore, beginning with an empty SEList, any sequence of m add(i, x) and remove(i) 
operations results in a total of 0(bm) time spent during all calls to spread(u) and gather(u). 

The space (measured in words) 1 used by an SEList that stores n elements is n + 0(b + 
n/b). 

The SEList is a tradeoff between an ArrayList and a DLList where the relative 
mix of these two structures depends on the block size b. At the extreme b = 2, each SEList 
node stores at most 3 values, which is really not much different than a DLList. At the other 



Recall Section 1.3 for a discussion of how memory is measured. 
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extreme, b > n, all the elements are stored in a single array, just like in an ArrayList. In 
between these two extremes lies a tradeoff between the time it takes to add or remove a 
list item and the time it takes to locate a particular list item. 

3.4 Discussion and Exercises 

Both singly-linked and doubly-linked lists are folklore, having been used in programs for 
over 40 years. They are discussed, for example, by Knuth [44, Sections 2.2.3-2.2.5]. Even 
the SEList data structure seems to be a well-known data structures exercise. 

Exercise 3.1. Why is it not possible, in an SLList to use a dummy node to avoid all the 
special cases that occur in the operations push(x), pop(), add(x), and remove()? 

Exercise 3.2. Describe and implement the List operations get(i), set(i, x), add(i, x) and 
remove(i) on an SLList. Each of these operations should run in 0(1 + i) time. 

Exercise 3.3. Design and implement SLList and DLList methods called checkSize(). These 
methods walk through the list and count the number of nodes to see if this matches the 
value, n, stored in the list. These methods return nothing, but throw an exception if the 
size they compute does not match the value of n. 

Exercise 3.4. Without referring to this chapter, try to recreate the code for the addBef ore(w) 
operation, that creates a node, u, and adds it just before the node w in a DLList. If your 
code does not exactly match the code given in this book it may still be correct. Test it and 
see if it works. 

The next few exercises involve performing manipulations on DLLists. These should 
all be done without allocating any new nodes or temporary arrays. More specifically, they 
can all be done only by changing the prev and next values of existing nodes. 

Exercise 3.5. Write a DLList method isPalindrome() that returns true if the list is a palin- 
drome, i.e., the element at position i is equal to the element at position n - i — 1 for all 
i 6 {0, . . . , n - 1}. Your code should run in O(n) time. 

Exercise 3.6. Write a method, truncate(i), that truncates a DLList at position i. After 
the execution of this method, the size of the list is i and it contains only the elements at 
indices 0, ...,i-l. The return value is another DLList that contains the elements at indices 
i,...,n- 1. This method should run in 0(min{i, n - i}) time. 

Exercise 3.7. Write a DLList method, absorb(12), that takes as an argument a DLList, 12, 
empties it and appends its contents, in order, to the receiver. For example, if 11 contains 
a,b,c and 12 contains d,e,f, then after calling H.absorb(12), 11 will contain a,b,c,d,e,f 
and 12 will be empty. 
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Exercise 3.8. Write a method deal() that removes all the elements with odd-numbered 
indices from a DLList and return a DLList containing these elements. For example, if 11, 
contains the elements a,b,c,d,e,f , then after calling U.dealQ, 11 should contain a,c,e and 
a list containing b,d,f should be returned. 
Exercise 3.9. Write a method, reverse(), that reverses the order of elements in a DLList. 

Exercise 3.10. This exercises walks you through an implementation of the merge sort al- 
gorithm for sorting a DLList, as discussed in Section 11.1.1. In your implementation, per- 
form comparisons between elements using the compareTo(x) method so that the resulting 
implementation can sort any DLList containing elements that implement the Comparable 
interface. 

1. Write a DLList method called takeFirst(12). This method takes the first node from 

12 and appends it to the the receiving list. This is equivalent to add(size(), 12.remove(0)), 
except that it should not create a new node. 

2. Write a DLList static method, merge(H,12), that takes two sorted lists 11 and 12, 
merges them, and returns a new sorted list containing the result. This causes 11 and 
12 to be emptied in the proces. For example, if 11 contains a, c,d and 12 contains 
b,e,f, then this method returns a new list containing a,b,c,d,e,f . 

3. Write a DLList method sort() that sorts the elements contained in the list using the 
merge sort algorithm. This recursive algorithm works as following: 

(a) If the list contains or 1 elements then there is nothing to do. Otherwise, 

(b) Split the list into two equal length lists 11 and 12 using the truncate(size()/2) 
method; 

(c) Recursively sort 1 1 ; 

(d) Recursively sort 12; and, finally, 

(e) Merge 11 and 12 into a single sorted list. 

The next few exercises are more advanced and require a clear understanding of 
what happens to the minimum value stored in a Stack or Queue as items are added and 
removed. 

Exercise 3.11. Design and implement a MinStack data structure that can store comparable 
elements and supports the stack operations push(x), pop(), and size(), as well as the min() 
operation, which returns the minimum value currently stored in the data structure. All 
operations should run in constant time. 
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Exercise 3.12. Design an implement a MinQueue data structure that can store comparable 
elements and supports the queue operations add(x), removeQ, and size(), as well as the 
min() operation, which returns the minimum value currently stored in the data structure. 
All operations should run in constant amortized time. 

Exercise 3.13. Design an implement a MinDeque data structure that can store compara- 
ble elements and supports the queue operations addFirst(x), addLast(x) removeFirst(), 
removel_ast() and size(), as well as the min() operation, which returns the minimum value 
currently stored in the data structure. All operations should run in constant amortized 
time. 

The next exercises are designed to test the reader's understanding of the imple- 
mentation an analysis of the space-efficient SELlst: 

Exercise 3.14. Prove that, if an SEList is used like a Stack (so that the modifications are 
done using push(x) = add(size(), x) and pop() = remove(size() - 1)) then these operations 
run in constant amortized time, independent of the value of b. 

Exercise 3.15. Design an implement of a version of an SEList that supports all the Deque 
operations in constant amortized time per operation, independent of the value of b. 
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Chapter 4 



Skiplists 



In this chapter, we discuss a beautiful data structure: the skiplist, that has a variety of 
applications. Using a skiplist we can implement a List that is fast for all the operations 
get(i), set(i, x), add(i, x), and remove(i). We can also implement an SSet in which all 
operations run in O(logn) expected time. 

Skiplists rely on randomization for their efficiency. In particular, a skiplist uses 
random coin tosses when an element is inserted to determine the height of that element. 
The performance of skiplists is expressed in terms of expected running times and lengths 
of paths. This expectation is taken over the random coin tosses used by the skiplist. In the 
implementation, the random coin tosses used by a skiplist are simulated using a pseudo- 
random number (or bit) generator. 

4.1 The Basic Structure 

Conceptually a skiplist is a sequence of singly-linked lists L ,...,L/„ where each L r con- 
tains a subset of the items in L r _\, We start with the input list L that contains n items 
and construct L l from L , L 2 from L\, and so on. The items in L r are obtained by tossing 
a coin for each element, x, in L r _ x and including x in L r if the coin comes up heads. This 
process ends when we create a list L r that is empty. An example of a skiplist is shown in 
Figure 4.1. 

For an element, x, in a skiplist, we call the height of x the largest value r such that 
x appears in L r . Thus, for example, elements that only appear in L have height 0. If we 
spend a few moments thinking about it, we notice that the height of x corresponds to the 
following experiment: Toss a coin repeatedly until the first time it comes up tails. How 
many times did it come up heads? The answer, not surprisingly, is that the expected height 
of a node is 1. (We expect to toss the coin twice before getting tails, but we don't count the 
last toss.) The height of a skiplist is the height of its tallest node. 
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Figure 4.1: A skiplist containing seven elements. 
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Figure 4.2: The search path for the node containing 4 in a skiplist. 

At the head of every list is a special node, called the sentinel, that acts as a dummy 
node for the list. The key property of skiplists is that there is a short path, called the search 
path, from the sentinel in L/, to every node in L . Remembering how to construct a search 
path for a node, u, is easy (see Figure 4.2) : Start at the top left corner of your skiplist 
(the sentinel in L/,) and always go right unless that would overshoot u, in which case you 
should take a step down into the list below. 

More precisely to construct the search path for the node u in L we start at the 
sentinel, w, in L h . Next, we examine w.next. If w.next contains an item that appears before 
u in L , then we set w = w.next. Otherwise, we move down and continue the search at the 
occurrence of w in the list L h _ l . We continue this way until we reach the predecessor of u 
inL . 

The following result, which we will prove in Section 4.4, shows that the search path 
is quite short: 

Lemma 4.1. The expected length of the search path for any node, u, in L is at most 21ogn + 
0(l) = 0(logn). 
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A space-efficient way to implement a Skiplist is to define a Node, u, as consisting 
of a data value, x, and an array, next, of pointers, where u.next[i] points to u's successor 
in the list L t . In this way, the data, x, in a node is referenced only once, even though x may 
appear in several lists. 

SkiplistSSet 



class Node<T> { 
T x; 

Node<T>[] next; 
Node(T ix, int h) { 

x = ix; 

next = (Node<T>[ ] )Array. newInstance(Node. class, h+1); 

} 

int height() { 

return next. length - 1; 
} 
} 



The next two sections of this chapter discuss two different applications of skiplists. 
In each of these applications, L stores the main structure (a list of elements or a sorted 
set of elements). The primary difference between these structures is in how a search path 
is navigated; in particular, they differ in how they decide if a search path should go down 
into L r _i or go right within L r . 

4.2 SkiplistSSet: An Efficient SSet Implementation 

A SkiplistSSet uses a skiplist structure to implement the SSet interface. When used this 
way, the list L stores the elements of the SSet in sorted order. The f ind(x) method works 
by following the search path for the smallest value y such that y > x: 

SkiplistSSet 



Node<T> findPredNode(T x) { 
Node<T> u = sentinel; 
int r = h; 
while (r >= 0) { 

while (u.next[r] != null && compare(u . next [r] . x, x) < 0) 

u = u.next[r]; // go right in list r 
r--; // go down into list r-1 

} 

return u; 

} 

T find(T x) { 

Node<T> u = findPredNode(x) ; 
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return u.next[0] == null ? null : u.next[0].x; 



Following the search path for y is easy: when situated at some node, u, in L r , we 
look right to u.next[r].x. If x > u.next[r].x, then we take a step to the right in L r , otherwise 
we move down into L [ _ 1 . Each step (right or down) in this search takes only constant time 
so, by Lemma 4.1, the expected running time of f ind(x) is O(logn). 

Before we can add an element to a SkipListSSet, we need a method to simulate 
tossing coins to determine the height, k, of a new node. We do this by picking a random 
integer, z, and counting the number of trailing Is in the binary representation of z: 1 

SkiplistSSet 



int pickHeight( ) { 

int z = rand.nextlnt ( ) ; 
int k = 0; 
int m = 1 ; 

while ((z & m) != 0) { 
k++; 
m <<= 1 ; 

} 

return k; 

} 



To implement the add(x) method in a SkiplistSSet we search for x and then splice 
x into a few lists L ,. . . ,L k , where k is selected using the pickHeightQ method. The easiest 
way to do this is to use an array, stack, that keeps track of the nodes at which the search 
path goes down from some list L r into L r _i- More precisely, stack[r] is the node in L r 
where the search path proceeded down into L r _i- The nodes that we modify to insert x are 
precisely the nodes stack[0],...,stack[k]. The following code implements this algorithm 
for add(x): 
SkiplistSSet 



boolean add(T x) { 

Node<T> u = sentinel; 
int r = h; 
int comp = 0; 
while (r >= 0) { 

while (u.next[r] != null && (comp = compare (u. next [r] . x,x) ) < 0) 



lr rhis method does not exactly replicate the coin-tossing experiment since the value of k will always be less 
than the number of bits in an int. However, this will have negligible impact unless the number of elements 
in the structure is much greater than 2 = 4294967296. 
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sentinel 



add(3.5) 



Figure 4.3: Adding the node containing 3.5 to a skiplist. The nodes stored in stack are 
highlighted. 



u = u .next [r] ; 
if (u.next[r] != null && comp == 0) return false; 
stack[r--] = u; // going down, store u 

} 

Node<T> w = new Node<T>(x, pickHeight ( ) ) ; 

while (h < w.height()) 

stack[++h] = sentinel; // increasing height of skiplist 
for (int i = 0; i < w. next . length; i++) { 

w.next[i] = stack[i] .next [ i] ; 

stack[i] .next [i ] = w; 

} 

n++; 

return true; 



} 



Removing an element, x, is done in a similar way, except that there is no need for 

stack to keep track of the search path. The removal can be done as we are following the 

search path. We search for x and each time the search moves downward from a node u, we 

check if u.next.x = x and if so, we splice u out of the list: 
SkiplistSSet 



boolean remove(T x) { 

boolean removed = false; 
Node<T> u = sentinel; 
int r = h; 
int comp = 0; 
while (r >= 0) { 

while (u.next[r] != null && (comp = compare(u. next [r] . x, x)) < 0) { 
u = u . next[r] ; 

} 

if (u.next[r] != null && comp == 0) { 
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sentinel 



remove(3) 



Figure 4.4: Removing the node containing 3 from a skiplist. 



removed = true; 

u.next[r] = u .next [r] .next [r] ; 

if (u == sentinel && u.next[r] == null) 

h--; // skiplist height has gone down 



} 

if (removed) n--; 

return removed; 



4.2.1 Summary 

The following theorem summarizes the performance of skiplists when used to implement 
sorted sets: 

Theorem 4.1. A SkiplistSSet implements the SSet interface. A SkiplistSSet supports the 
operations add(x), remove(x), and f ind(x) in O(logn) expected time per operation. 

4.3 SkiplistList: An Efficient Random-Access List Implementation 

A SkiplistList implements the List interface on top of a skiplist structure. In a Skip- 
listList, L contains the elements of the list in the order they appear in the list. Just like 
with a SkiplistSSet, elements can be added, removed, and accessed in O(logn) time. 

For this to be possible, we need a way to follow the search path for the ith element 
in L . The easiest way to do this is to define the notion of the length of an edge in some list, 
L r . We define the length of every edge in L as 1. The length of an edge, e, in L r , r > 0, 
is defined as the sum of the lengths of the edges below e in L r _ 1 . Equivalently, the length 
of e is the number of edges in L below e. See Figure 4.5 for an example of a skiplist with 
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Figure 4.5: The lengths of the edges in a skiplist. 

the lengths of its edges shown. Since the edges of skiplists are stored in arrays, the lengths 
can be stored the same way: 



class Node { 




T x; 




Node[ ] next; 




int [ ] length; 




Node(T ix, int h) { 




x = ix; 




next = (Node[ ] )Array.newInstance(Node .class, 


h+1); 


length = new int[h+1]; 




} 




int height () { 




return next. length - 1; 




} 




} 





The useful property of this definition of length is that, if we are currently at a node 
that is at position j in L and we follow an edge of length £, then we move to a node whose 
position, in L , is ]+£. In this way, while following a search path, we can keep track of the 
position, j, of the current node in L . When at a node, u, in L r , we go right if j plus the 
length of the edge u.next[r] is less than i, otherwise we go down into L c _\. 

— SkiplistList 



Node findPred(int i) { 
Node u = sentinel; 
int r = h; 

int j = -1; // the index of the current node in list 
while (r >= 0) { 

while (u.next[r] != null && j + u.length[r] < i) { 
j += u . length[r] ; 
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Figure 4.6: Adding an element to a SkiplistList. 



u = u . next [r] ; 



} 

return u; 



SkiplistList 



T get(int i) { 

return f indPred(i ) . next [0] .x ; 

} 

T set(int i, T x) { 

Node u = findPred(i) .next[0] ; 

T y = u.x; 

u . x = x; 

return y; 
} 



Since the hardest part of the operations get(i) and set(i, x) is finding the ith node 
in L , these operations run in O(logn) time. 

Adding an element to a SkiplistList at a position, i, is fairly straightforward. 
Unlike in a SkiplistSSet, we are sure that a new node will actually be added, so we can 
do the addition at the same time as we search for the new node's location. We first pick the 
height, k, of the newly inserted node, w, and then follow the search path for i. Anytime 
the search path moves down from L r with r < k, we splice w into L r . The only extra care 
needed is to ensure that the lengths of edges are updated properly. See Figure 4.6. 

Note that, each time the search path goes down at a node, u, in L r , the length of 
the edge u.next[r] increases by one, since we are adding an element below that edge at 
position i. Splicing the node w between two nodes, u and z, works as shown in Figure 4.7. 
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Figure 4.7: Updating the lengths of edges while splicing a node w into a skiplist. 



While following the search path we are already keeping track of the position, j, of u in L . 
Therefore, we know that the length of the edge from u to w is i - j . We can also deduce the 
length of the edge from w to z from the length, £, of the edge from u to z. Therefore, we 
can splice in w and update the lengths of the edges in constant time. 

This sounds more complicated than it actually is and the code is actually quite 
simple: 

— SkiplistList 



void add(int i, T x) { 

Node w = new Node(x, pickHeight ( ) ) ; 

if (w.height( ) > h) 
h = w. height ( ) ; 

add(i, w) ; 
} 



SkiplistList 

Node add(int i, Node w) { 
Node u = sentinel; 
int k = w. height ( ) ; 
int r = h; 

int j = -1; // index of u 
while (r >= 0) { 

while (u.next[r] != null && j+u . length[r] < i) { 
j += u . length[r] ; 
u = u .next [r] ; 

} 

u . length[r]++; // to account for new node in list 

if ( r <= k) { 

w.next[r] = u.next[r]; 

u. next [r] = w; 

w.length[r] = u.length[r] - (i - j); 

u.length[r] = i - j; 
} 
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Figure 4.8: Removing an element from a SkiplistList. 



r--; 

} 

n++; 

return u; 



By now, the implementation of the remove(i) operation in a SkiplistList should 
be obvious. We follow the search path for the node at position i. Each time the search path 
takes a step down from a node, u, at level r we decrement the length of the edge leaving u 
at that level. We also check if u.next[r] is the element of rank i and, if so, splice it out of 
the list at that level. An example is shown in Figure 4.8. 

SkiplistList 



T remove(int i) { 
T x = null; 
Node u = sentinel; 
int r = h; 

int j = -1; // index of node u 
while (r >= 0) { 

while (u.next[r] != null && j+u . length[r] < i) { 
j += u . length[r] ; 
u = u .next [r] ; 

} 

u . length[r]-- ; // for the node we are removing 

if (j + u.length[r] + 1 == i && u.next[r] != null) { 

x = u.next [r] . x; 

u.length[r] += u.next [r] .length[r] ; 

u.next[r] = u . next [r] .next[r] ; 

if (u == sentinel && u.next[r] == null) 
h— ; 
} 
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} 

n--; 

return x; 
} 



4.3.1 Summary 

The following theorem summarizes the performance of the SkiplistList data structure: 

Theorem 4.2. A SkiplistList implements the List interface. A SkiplistList supports the 
operations get(i), set(i,x), add(i, x), and remove(i) in O(logn) expected time per operation. 

4.4 Analysis of Skiplists 

In this section, we analyze the expected height, size, and length of the search path in a 
skiplist. This section requires a background in basic probability. Several proofs are based 
on the following basic observation about coin tosses. 

Lemma 4.2. Let T be the number of times a fair coin is tossed up to and including the first time 
the coin comes up heads. Then E[T] = 2. 

Proof. Suppose we stop tossing the coin the first time it comes up heads. Define the indi- 
cator variable 

if the coin is tossed less than i times 

1 if the coin is tossed i or more times 



h = 



Note that 7,- = 1 if and only if the first i - 1 coin tosses are tails, so E[J,] = Pr{I; = 1 } = 1/2' l . 
Observe that T, the total number of coin tosses, can be written as T = Y4L1 h- Therefore, 



E[T] = E 



J = l 

oo 

£e[i,.] 

1=1 

oo 



1 + 1/2+ 1/4+ 1/8 + --- 

2 . □ 



The next two lemmata tell us that skiplists have linear size: 
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Lemma 4.3. The expected number of nodes in a skiplist containing n elements, not including 
occurrences of the sentinel, is 2n. 

Proof. The probability that any particular element, x, is included in list L r is l/2 r , so the 
expected number of nodes in L r is n/2 r . 2 Therefore, the total expected number of nodes in 

all lists is 

^n/2 r = n(l + l/2+l/4+l/8 + ---) = 2n . □ 

r=0 

Lemma 4.4. The expected height of a skiplist containing n elements is at most log n + 2. 
Proof. For each r e {1,2, 3,..., oo}, define the indicator random variable 

{0 if L r is empty 
1 if L r is non-empty 

The height, h, of the skiplist is then given by 

oo 

1=1 
Note that I r is never more than the length, |L r |, of L r , so 

E[; r ] < E[|L r |] = n/2 r . 

Therefore, we have 



E[h] = E 



r=\ 



r=l 

LlognJ co 

= J^E[I C ]+ J^ £ [ 7 r] 

r=l r=[lognJ + l 

Llog nj oo 

r=l r=[lognJ + l 

oo 

<logn+^l/2 r 
r=0 

= logn + 2 . 



D 



"See Section 1.2.4 to see how this is derived using indicator variables and linearity of expectation. 
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Lemma 4.5. The expected number of nodes in a skiplist containing n elements, including all 
occurrences of the sentinel, is 2n + O(logn). 



Proof. By Lemma 4.3, the expected number of nodes, not including the sentinel, is 2n. 
The number of occurrences of the sentinel is equal to the height, h, of the skiplist so, 
by Lemma 4.4 the expected number of occurrences of the sentinel is at most logn + 2 = 
O(logn). □ 



Lemma 4.6. The expected length of a search path in a skiplist is at most 2 logn + 0(1). 

Proof. The easiest way to see this is to consider the reverse search path for a node, x. This 
path starts at the predecessor of x in L . At any point in time, if the path can go up a 
level, then it does. If it cannot go up a level then it goes left. Thinking about this for a few 
moments will convince us that the reverse search path for x is identical to the search path 
for x, except that it is reversed. 

The number of nodes that the reverse search path visits at a particular level, r, is 
related to the following experiment: Toss a coin. If the coin comes up heads then go up 
and stop, otherwise go left and repeat the experiment. The number of coin tosses before 
the heads then represents the number of steps to the left that a reverse search path takes 
at a particular level. 3 Lemma 4.2 tells us that the expected number of coin tosses before 
the first heads is 1. 

Let S r denote the number of steps the forward search path takes at level r that go 
to the right. We have just argued that E[S r ] < 1. Furthermore, S r < |L r |, since we can't take 
more steps in L r than the length of L r , so 

E[S r ]<E[|L r |] = n/2 r . 

We can now finish as in the proof of Lemma 4.4. Let S be the length of the search path for 



Note that this might overcount the number of steps to the left, since the experiment should end either at 
the first heads or when the search path reaches the sentinel, whichever comes first. This is not a problem since 
the lemma is only stating an upper bound. 
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some node, u, in a skiplist, and let h be the height of the skiplist. Then 



E[S] = E 



r=0 



= E[h] + ^E[S r ] 



r=0 

LlognJ oo 

= E[h]+ ^E[S r ]+ J^ E ^ 

r=0 r=[lognJ + l 

Llog nj co 

<E[h] + f 1+ Y n/2 r 

r=0 r=[lognJ + l 

LlognJ oo 

<E[h] + J\ l + J^l/2 r 

r=0 r=0 

LlognJ oo 

<E[h] + Y\ l + ^l/2 r 

r=0 r=0 

<E[h] + logn + 3 

< 21ogn + 5 . 



□ 



The following theorem summarizes the results in this section: 

Theorem 4.3. A skiplist containing n elements has expected size O(n) and the expected length 
of the search path for any particular element is at most 21ogn + 0(1). 

4.5 Discussion and Exercises 

Skiplists were introduced by Pugh [59] who also presented a number of applications of 
skiplists [58]. Since then they have been studied extensively. Several researchers have 
done very precise analysis of the expected length and variance in length of the search path 
for the ith element in a skiplist [43, 42, 56]. Deterministic versions [51], biased versions 
[7, 24], and self-adjusting versions [11] of skiplists have all been developed. Skiplist im- 
plementations have been written for various languages and frameworks and have seen use 
in open-source database systems [67, 61]. A variant of skiplists is used in the HP-UX op- 
erating system kernel's process management structures [40]. Skiplists are even part of the 
Java 1.6 API [53]. 

Exercise 4.1. Show that, during an add(x) or a remove(x) operation, the expected number 
of pointers in the structure that get changed is constant. 



4. Skiplists 4.5. Discussion and Exercises 

Exercise 4.2. Suppose that, instead of promoting an element from L;_i into L; based on 
a coin toss, we promote it with some probability p, < p < 1. Show that the expected 
length of the search path in this case is at most (l/p)log 1/ n + 0(1). What is the value of 
p that minimizes this expression? What is the expected height of the skiplist? What is the 
expected number of nodes in the skiplist? 

Exercise 4.3. Design and implement a find(x) method for SkiplistSSet that avoids locally- 
redundant comparisons; these are comparisons that have already been done and occur be- 
cause u.next[r] = u.next[r- 1]. Analyze the expected number of comparisons done by 
your modified f ind(x) method. 

Exercise 4.4. Design and implement a version of a skiplist that implements the SSet inter- 
face, but also allows fast access to elements by rank. That is, it also supports the function 
get(i), which returns the element whose rank is i in O(logn) expected time. (The rank of 
an element x in an SSet is the number of elements in the SSet that are less than x.) 

Exercise 4.5. A finger in a skiplist is an array that stores the sequence of nodes on a search 
path at which the search path goes down. (The variable stack in the add(x) code on page 78 
is a finger; the shaded nodes in Figure 4.3 show the contents of the finger.) One can think 
of a finger as pointing out the path to a node in the lowest list, L . 

A finger search implements the find(x) operation using a finger, by walking up 
the list using the finger until reaching a node u such that u.x < x and u.next = null or 
u.next.x > x and then performing a normal search for x starting from u. It is possible to 
prove that the expected number of steps required for a finger search is 0(1 + logr), where 
r is the number values in L between x and the value pointed to by the finger. 

Implement a subclass of Skiplist called SkiplistWithFinger that does all f ind(x) 
operations using an internal finger. This class stores a finger and does every search as a 
finger search. During the search it also updates the finger so that each f ind(x) operation 
uses, as a starting point, a finger that points to the result of the previous f ind(x) operation. 

Exercise 4.6. Write a method, truncate(i), that truncates a SkiplistList at position i. 
After the execution of this method, the size of the list is i and it contains only the elements 
at indices 0, ...,i- 1. The return value is another SkiplistList that contains the elements 
at indices i, . . . , n - 1 . This method should run in 0(log n) time. 

Exercise 4.7. Write a SkiplistList method, absorb(12), that takes as an argument a Skip- 
listList, 12, empties it and appends its contents, in order, to the receiver. For example, 
if 11 contains a,b,c and 12 contains d,e,f, then after calling H.absorb(12), 11 will contain 
a, b, c, d, e,f and 12 will be empty. This method should run in 0(log n) time. 
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Exercise 4.8. Using the ideas from the space-efficient linked-list, SEList, design and im- 
plement a space-efficient SSet, SESSet. Do this by storing the data, in order, in an SEList 
and then storing the blocks of this SEList in an SSet. If the original SSet implementation 
uses O(n) space to store n elements, then the SESSet will use enough space for n elements 
plus 0(n/b + b) wasted space. 

Exercise 4.9. Using an SSet as your underlying structure, design and implement an appli- 
cation that reads a (large) text file and allow you to search, interactively, for any substring 
contained in the text. 

Hint 1 : Every substring is a prefix of some suffix, so it suffices to store all suffixes of the 
text file. 

Hint 2: Any suffix can be represented compactly as a single integer indicating where the 
suffix begins in the text. 

Test your application on some large texts like some of the books available at Project Guten- 
berg [1]. 

Exercise 4.10. (This excercise is to be done after reading about binary search trees, in Sec- 
tion 6.2.) Compare skiplists with binary search trees in the following ways: 

1. Explain how removing some edges of a skiplist lead to a structure that looks like a 
binary tree and that is similar to a binary search tree. 

2. Skiplists and binary search trees each use of about the same number of pointers (2 
per node). Skiplists make better use of those pointers, though. Explain why. 
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Chapter 5 
Hash Tables 



Hash tables are an efficient method of storing a small number, n, of integers from a large 
range U — {0, ...,2 W - 1}. The term hash table includes a broad range of data structures. 
This chapter focuses on one of the most common implementations of hash tables, namely 
hashing with chaining. 

Very often hash tables store data that are not integers. In this case, an integer hash 
code is associated with each data item and this hash code is used in the hash table. The 
second part of this chapter discusses how such hash codes are generated. 

Some of the methods used in this chapter require random choices of integers in 
some specific range. In the code samples, some of these "random" integers are hard-coded 
constants. These constants were obtained using random bits generated from atmospheric 
noise. 

5.1 ChainedHashTable: Hashing with Chaining 

A ChainedHashTable data structure uses hashing with chaining to store data as an array, t, 

of lists. An integer, n, keeps track of the total number of items in all lists (see Figure 5.1): 

ChainedHashTable 



List<T>[] t; 
int n; 



The hash value of a data item x, denoted hash(x) is a value in the range {0, ..., t. length- 1}. 
All items with hash value i are stored in the list at t[i]. To ensure that lists don't get too 
long, we maintain the invariant 

n < t. length 
so that the average number of elements stored in one of these lists is n/t. length < 1. 
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Figure 5.1: An example of a ChainedHashTable with n = 14 and t. length = 16. In this 
example hash(x) = 6 



To add an element, x, to the hash table, we first check if the length of t needs to be 

increased and, if so, we grow t. With this out of the way we hash x to get an integer, i, in 

the range {0,...,t. length - 1} and we append x to the list t[i]: 
ChainedHashTable 



boolean add(T x) { 

if (find(x) != null) return false; 

if (n+1 > t. length) resize(); 

t[hash(x)] .add(x) ; 

n++; 

return true; 
} 



Growing the table, if necessary, involves doubling the length of t and reinserting all el- 
ements into the new table. This is exactly the same strategy used in the implementation 
of ArrayStack and the same result applies: The cost of growing is only constant when 
amortized over a sequence of insertions (see Lemma 2.1 on page 29). 

Besides growing, the only other work done when adding x to a ChainedHashTable 
involves appending x to the list t[hash(x)]. For any of the list implementations described 
in Chapters 2 or 3, this takes only constant time. 

To remove an element x from the hash table we iterate over the list t[hash(x)] until 

we find x so that we can remove it: 
ChainedHashTable 



T remove (T x) { 

Iterator<T> it = t [hash(x) ] . iterator( 
while (it.hasNext( ) ) { 
T y = it .next ( ) ; 
if (y . equals(x) ) { 
it .remove( ) ; 
n--; 
return y; 



92 



5. Hash Tables 5.1. ChainedHashTable: Hashing with Chaining 



} 
} 
return null; 

} 



This takes 0(n hash ( x j) time, where n ± denotes the length of the list stored at t[i]. 

Searching for the element x in a hash table is similar. We perform a linear search 
on the list t[hash(x)]: 
ChainedHashTable 



T find(Object x) { 

for (T y : t[hash(x) ] ) 
if (y . equals(x) ) 
return y; 
return null; 
} 



Again, this takes time proportional to the length of the list t[hash(x)]. 

The performance of a hash table depends critically on the choice of the hash func- 
tion. A good hash function will spread the elements evenly among the t. length lists, so 
that the expected size of the list t[hash(x)] is 0(n/t. length) = 0(1). On the other hand, a 
bad hash function will hash all values (including x) to the same table location, in which 
case the size of the list t[hash(x)] will be n. In the next section we describe a good hash 
function. 

5.1.1 Multiplicative Hashing 

Multiplicative hashing is an efficient method of generating hash values based on modular 
arithmetic (discussed in Section 2.3) and integer division. It uses the div operator, which 
calculates the integral part of a quotient, while discarding the remainder. Formally, for 
any integers a > and b > 1, adivb - \_a/b\. 

In multiplicative hashing, we use a hash table of size 2 d for some integer d (called 
the dimension). The formula for hashing an integer x 6 {0, . . . , 2 W - 1 } is 

hash(x) = ((z-x)mod2 w )div2 w - d . 

Here, z is a randomly chosen odd integer in {1,...,2 W - 1}. This hash function can be real- 
ized very efficiently by observing that, by default, operations on integers are already done 
modulo 2 W where w is the number of bits in an integer. (See Figure 5.2.) Furthermore, 
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5.1. ChainedHashTable: Hashing with Chaining 



2 W (4294967296) 

z (4102541685) 

x(42) 

z • x 

(z • x) mod 2 W 

((z-x)mod2 w )div2 w - d 



100000000000000000000000000000000 
11110100100001111101000101110101 
00000000000000000000000000101010 
10100000011110010010000101110100110010 
00011110010010000101110100110010 
00011110 



Figure 5.2: The operation of the multiplicative hash function with w = 32 and d = 8. 



integer division by 2 w ~ d is equivalent to dropping the rightmost w - d bits in a binary rep- 
resentation (which is implemented by shifting the bits right by w- d). In this way the code 
that implements the above formula is simpler than the formula itself: 

ChainedHashTable 



int hash(0bject x) { 

return (z * x . hashCode( ) ) >>> (w-d); 
} 



The following lemma, whose proof is deferred until later in this section, shows that 
multiplicative hashing does a good job of avoiding collisions: 

Lemma 5.1. Let x and y be any two values in {0,...,2 W - 1} with x ^ y. Then Prjhash(x) = 
hash(y)} < 2/2 d . 

With Lemma 5.1, the performance of remove(x), and f ind(x) are easy to analyze: 

Lemma 5.2. For any data value x, the expected length of the list t[hash(x)] is at most n x + 2, 
where n x is the number of occurrences of x in the hash table. 

Proof. Let S be the (multi-)set of elements stored in the hash table that are not equal to x. 
For an element yeS, define the indicator variable 

(1 if hash(x) = hash(y) 
otherwise 

and notice that, by Lemma 5.1, E[7 y ] < 2/2 d = 2/t. length. The expected length of the list 
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t[hash(x)] is given by 

E[t[hash(x)].size()] = E n x + ^J y 

ysS 

= n x + ^E[J y ] 
ysS 

< n x + V2/t. length 

yeS 

< n x +V2/n 

yeS 

< n x + (n-n x )2/n 

< n x + 2 , 

as required. D 

Now, we want to prove Lemma 5.1, but first we need a result from number theory. 
In the following proof, we use the notation (b r ,...,b ) 2 to denote YJi=o^i'^ 1 > where each b{ is 
a bit, either or 1. In other words, (b r ,...,b ) 2 is the integer whose binary representation 
is given by b r , . . . , b . We use * to denote a bit of unknown value. 

Lemma 5.3. Let S be the set of odd integers in {1,...,2 W - 1}, Let q and i be any two elements in 
S. Then there is exactly one value z € S such that zq mod 2 W = i. 

Proof. Since the number of choices for z and i is the same, it is sufficient to prove that there 
is at most one value z e S that satisfies zq mod 2 W = i. 

Suppose, for the sake of contradiction, that there are two such values z and z', with 
z > z'. Then 

zq mod 2 W = z'q mod 2 W = i 

So 

(z-z')^mod 2 W = 

But this means that 

(z-z')<? = fc2 w (5.1) 

for some integer k. Thinking in terms of binary numbers, we have 

(z-z')q = k-(l,0,...,0) 2 , 

w 

so that the w trailing bits in the binary representation of (z - z')q are all O's. 
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Furthermore k * since q * and z - z' * 0. Since g is odd, it has no trailing 0's in 
its binary representation: 

q = {*,..., -k,l) 2 ■ 

Since |z - z'| < 2 W , z - z' has fewer than w trailing 0's in its binary representation: 

z-z' = (*,...,*,l,0,...,0) 2 . 

<w 

Therefore, the product (z - z')q has fewer than w trailing 0's in its binary representation: 

(z-z')<? = (*,---,*,l,0,...,0) 2 . 

<w 

Therefore (z - z')q cannot satisfy (5.1), yielding a contradiction and completing the proof. 

□ 

The utility of Lemma 5.3 comes from the following observation: If z is chosen 
uniformly at random from S, then zt is uniformly distributed over S. In the following 
proof, it helps to think of the binary representation of z, which consists of w - 1 random 
bits followed by a 1 . 

Proof of Lemma 5.1. First we note that the condition hash(x) = hash(y) is equivalent to the 
statement "the highest-order d bits of zx mod 2 W and the highest-order d bits of zy mod 2 W 
are the same." A necessary condition of that statement is that the highest-order d bits in 
the binary representation of z(x - y) mod 2 W are either all 0's or all l's. That is, 

z(x-y)mod 2 W = (0,...,0,*,...,*) 2 (5.2) 

d w-d 

when zx mod 2 W > zy mod 2 W or 

z(x-y)mod 2 W = (1,...,1,*,...,*) 2 . (5.3) 

d w-d 

when zx mod 2 W < zy mod 2 W . Therefore, we only have to bound the probability that z(x - 
y) mod 2 W looks like (5.2) or (5.3). 

Let q be the unique odd integer such that (x-y) mod 2 W = q2 r for some integer r > 0. 
By Lemma 5.3, the binary representation of zq mod 2 W has w- 1 random bits, followed by 
a 1: 

zq mod 2 W = (b w _ 1 ,...,b 1 ,l) 2 

w-l 
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Therefore, the binary representation of z(x-y) mod 2 W = zq2 r mod 2 W has w-r-1 random 
bits, followed by a 1, followed by r O's: 

z(x-y) mod 2 W = z^2 r mod 2 W = {b^ l _ r _ l ,...,b l , 1,0,0,..., 0) 2 

w-r-1 r 

We can now finish the proof: If r > w - d, then the d higher order bits of z(x - y) mod 2 W 
contain both O's and l's, so the probability that z(x - y) mod 2 W looks like (5.2) or (5.3) is 
0. If r = w - d, then the probability of looking like (5.2) is 0, but the probability of looking 
like (5.3) is l/2 d_1 = 2/2 d (since we must have bi,...,b^_i = 1,...,1). If r < w- d then we 
must have b w _ r _i,...,b w _ r _ d = 0,...,0 or b w _ r _i,...,b w _ r _ d = 1,...,1. The probability of each 
of these cases is l/2 d and they are mutually exclusive, so the probability of either of these 
cases is 2/2 d . This completes the proof. □ 

5.1.2 Summary 

The following theorem summarizes the performance of the ChainedHashTable data struc- 
ture: 

Theorem 5.1. A ChainedHashTable implements the USet interface. Ignoring the cost of calls 
to grow(), a ChainedHashTable supports the operations add(x), remove(x), and find(x) in 
O(l) expected time per operation. 

Furthermore, beginning with an empty ChainedHashTable, any sequence of m add(x) 
and remove(x) operations results in a total ofO(m) time spent during all calls to grow(). 

5.2 LinearHashTable: Linear Probing 

The ChainedHashTable data structure uses an array of lists, where the ith list stores all 
elements x such that hash(x) = i. An alternative, called open addressing is to store the 
elements directly in an array, t, with each array location in t storing at most one value. 
This is the approach taken by the LinearHashTable described in this section. In some 
places, this data structure is described as open addressing with linear probing. 

The main idea behind a LinearHashTable is that we would, ideally, like to store 
the element x with hash value i = hash(x)in the table location t[i]. If we can't do this 
(because some element is already stored there) then we try to store it at location t[(i + 
1) mod t. length]; if that's not possible, then we try t[(i + 2) mod t. length], and so on, 
until we find a place for x. 

There are three types of entries stored in t: 
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1. data values: actual values in the USet that we are representing; 

2. null values: at array locations where no data has ever been stored; and 

3. del values: at array locations where data was once stored but that has since been 
deleted. 

In addition to the counter, n, that keeps track of the number of elements in the Linear- 
HashTable, a counter, q, keeps track of the number of elements of Types 1 and 3. That is, 
q is equal to n plus the number of del values in t. To make this work efficiently, we need t 
to be considerably larger than q, so that there are lots of null values in t. The operations 
on a LinearHashTable therefore maintain the invariant that t. length > 2q. 

To summarize, a LinearHashTable contains an array, t, that stores data elements, 
and integers n and q that keep track of the number of data elements and non-null values 
of t, respectively. Because many hash functions only work for table sizes that are a power 
of 2, we also keep an integer d and maintain the invariant that t. length = 2 d . 
LinearHashTable 



T[] t; // the table 

int n; // the size 

int d; // t. length = 2~d 

int q; // number of non-null entries in t 



The f ind(x) operation in a LinearHashTable is simple. We start at array entry t[i] 

where i = hash(x) and search entries t[i], t[(i + 1) mod t. length], t[(i + 2) mod t. length], 

and so on, until we find an index i' such that, either, t[i'] = x, or t[i'] = null. In the former 

case we return t[i']. In the latter case, we conclude that x is not contained in the hash table 

and return null. 
LinearHashTable 



T find(T x) { 
int i = hash(x) ; 
while (t[i] != null) { 

if (t[i] != del && x . equals( t [i] ) ) return t[i]; 

i = (i == t. length- 1) ? : i + 1; // increment i (mod t. length) 

} 

return null; 

} 



The add(x) operation is also fairly easy to implement. After checking that x is 
not already stored in the table (using find(x)), we search t[i], t[(i + 1) mod t. length], 
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t[(i + 2) mod t. length], and so on, until we find a null or del and store x at that location, 

increment n, and q, if appropriate.: 
LinearHashTable 



boolean add(T x) { 

if (find(x) != null) return false; 

if (2*(q+1) > t. length) resize(); // max 50% occupancy 

int i = hash(x) ; 

while (t[i] != null && t[i] != del) 

i = (i == t.length-1) ? : i + 1; // increment i (mod t. length) 
if (t[i] == null) q++; 
n++; 

t[i] = x; 
return true; 

} 



By now, the implementation of the remove(x) operation should be obvious. We 

search t[i], t[(i + 1) mod t. length], t[(i + 2) mod t. length], and so on until we find an 

index i' such that t[i'] = xor t[i'] = null. In the former case, we set t[i'] = del and return 

true. In the latter case we conclude that x was not stored in the table (and therefore cannot 

be deleted) and return false. 
LinearHashTable 



T remove (T x) { 
int i = hash(x) ; 
while (t[i] != null) { 
T y = t[i]; 

if (y != del && x.equals(y)) { 
t[i] = del; 
n--; 

if (8*n < t. length) resize(); // min 12.5% occupancy 
return y; 

} 

i = (i == t.length-1) ? : i + 1; // increment i (mod t. length) 

} 

return null; 

} 



The correctness of the f ind(x), add(x), and remove(x) methods is easy to verify, 
though it relies on the use of del values. Notice that none of these operations ever sets 
a non-null entry to null. Therefore, when we reach an index i' such that t[i'] = null, 
this is a proof that the element, x, that we are searching for is not stored in the table; t[i'] 
has always been null, so there is no reason that a previous add(x) operation would have 
proceeded beyond index i'. 
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The resize() method is called by add(x) when the number of non-null entries 

exceeds n/2 or by remove(x) when the number of data entries is less than t. length/8. The 

resizeQ method works like the resize() methods in other array-based data structures. 

We find the smallest non-negative integer d such that 2 d > 3n. We reallocate the array t 

so that it has size 2 d and then we insert all the elements in the old version of t into the 

newly-resized copy of t. While doing this we reset q equal to n since the newly-allocated 

t has no del values. 
LinearHashTable 



void resize( ) { 
d = 1; 

while ((1<<d) < 3*n) d++; 
T[] told = t; 
t = newArray ( 1<<d) ; 
q = n; 

// insert everything in told 
for (int k = 0; k < told. length; k++) { 
if (told[k] != null && told[k] != del) { 
int i = hash(told[k]); 
while (t[i] != null) 

i = (i == t. length- 1) ? : i + 1; 
t[i] = told[k]; 
} 
} 
} 



5.2.1 Analysis of Linear Probing 

Notice that each operation, add(x), remove(x), or f ind(x), finishes as soon as (or before) it 
discovers the first null entry in t. The intuition behind the analysis of linear probing is 
that, since at least half the elements in t are equal to null, an operation should not take 
long to complete because it will very quickly come across a null entry. We shouldn't rely 
too heavily on this intuition though, because it would lead us to (the incorrect) conclusion 
that the expected number of locations in t examined by an operation is at most 2. 

For the rest of this section, we will assume that all hash values are independently 
and uniformly distributed in {0,...,t. length - 1}. This is not a realistic assumption, but it 
will make it possible for us to analyze linear probing. Later in this section we will describe 
a method, called tabulation hashing, that produces a hash function that is "good enough" 
for linear probing. We will also assume that all indices into the positions of t are taken 
modulo t. length, so that t[i] is really a shorthand for t[i mod t. length]. 

We say that a run of length k that starts at i occurs when t[i], t[i+ 1 ],..., t[i + k — 1] 
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are all non-null and t[i - 1] = t[i + k] — null. The number of non-null elements of t 
is exactly q and the add(x) method ensures that, at all times, q < t. length/2. There are q 
elements x !,..., x q that have been inserted into t since the last rebuild() operation. By our 
assumption, each of these has a hash value, hash(xy), that is uniform and independent of 
the rest. With this setup, we can prove the main lemma required to analyze linear probing. 

Lemma 5.4. Fix a value i e {0, ...,t. length - 1}. Then the probability that a run of length k 
starts at i is 0(c k ) for some constant < c < 1. 

Proof. If a run of length k starts at i, then there are exactly k elements x,- such that 
hash(x.) 6 {i,...,l + k - 1}. The probability that this occurs is exactly 

q\/ k \ j t. length -k xq 



Pk i , 1 1 ii 

\/c/\ t. length / \ t. length 

since, for each choice of k elements, these k elements must hash to one of the k locations 
and the remaining q - k elements must hash to the other t. length - k table locations. 1 

In the following derivation we will cheat a little and replace r! with (r/e) r . Stirling's 
Approximation (Section 1.2.2) shows that this is only a factor of O(^) from the truth. 
This is just done to make the derivation simpler; Exercise 5.3 asks the reader to redo the 
calculation more rigorously using Stirling's Approximation in its entirety. 

The value of p^ is maximized when t. length is minimum, and the data structure 
maintains the invariant that t. length > 2q, so 



Pk 



q \(k\ k (2 q -k^- k 



A:/\2q/ \ 2q 

q! \( k Vl2q-k 



(q-Jt)!Jk!/\2q/ \ 2q 

q^ W k \ k l2q-k xq ~ k 



(q-k)Q- k k k l\2qJ \ 2q 

q*ql- fc \( k \ k t2q-k\ q ~ k 



[Stirling's approximation] 



{q-k)*- k k k }\2q} \ 2q 
qk^(q(2q-k)^ k 



2qk) \2q(q-k) 

kl (2q-k)^ k 



(I) 



2(q-fc) 



Note that pj- is greater than the probability that a run of length k starts at i, since the definition of pjt does 
not include the requirement t[i - 1] = t[i + k] = null. 



101 



5. Hash Tables 



5.2. LinearHashTable: Linear Probing 



■(if 



2(q-fc) 



q-k 



V^ 



(In the last step, we use the inequality (1 + l/x) x < e, which holds for all x > 0.) Since 
Ve/2 < 0.824360636 < 1, this completes the proof. □ 

Using Lemma 5.4 to prove upper-bounds on the expected running time of f ind(x), 
add(x), and remove(x) is now fairly straight-forward. Consider the simplest case, where we 
execute f ind(x) for some value x that has never been stored in the LinearHashTable. In 
this case, i = hash(x) is a random value in {0, ..., t. length- 1} independent of the contents 
of t. If i is part of a run of length k then the time it takes to execute the f ind(x) operation 
is at most 0(1 + k). Thus, the expected running time can be upper-bounded by 

( t. length oo 

1 + 1 J y y fcPrji is part of a run of length k) 



O 



(=1 k=0 

Note that each run of length k contributes to the inner sum k times for a total contribution 
of k , so the above sum can be rewritten as 

/ _ t. length oo \ 



o 



<0 



1 + 1 I ) ) fc 2 Prli starts a run of length k\ 

\t. length/ Z— Z— ' & 
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t. length oo 

i+ (—*—) y y* 

\t. length/ Z_ L^ 
1 + ]^k 2 p k 



i=l k=0 



Pk 



k=0 



o 



= 0(1) 



Yjc 2 -0(c 



k=0 



The last step in this derivation comes from the fact that HjtLg ^ 2 ' 0(c k ) is an exponentially 
decreasing series. 2 Therefore, we conclude that the expected running time of the f ind(x) 
operation for a value x that is not contained in a LinearHashTable is 0(1). 

If we ignore the cost of the resize() operation, the above analysis gives us all we 
need to analyze the cost of operations on a LinearHashTable. 



In the terminology of many calculus texts, this sum passes the ratio test: There exists a positive integer fcn 
such that, for all k > k , ( * + ^f +1 < 1. 
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First of all, the analysis of f ind(x) given above applies to the add(x) operation when 
x is not contained in the table. To analyze the f ind(x) operation when x is contained in 
the table, we need only note that this is the same as the cost of the add(x) operation that 
previously added x to the table. Finally, the cost of a remove(x) operation is the same as 
the cost of a f ind(x) operation. 

In summary, if we ignore the cost of calls to resize(), all operations on a Linear- 
HashTable run in 0(1) expected time. Accounting for the cost of resize can be done using 
the same type of amortized analysis performed for the ArxayStack data structure in Sec- 
tion 2.1. 

5.2.2 Summary 

The following theorem summarizes the performance of the LinearHashTable data struc- 
ture: 

Theorem 5.2. A LinearHashTable implements the USet interface. Ignoring the cost of calls 
to resizeQ, a LinearHashTable supports the operations add(x), remove(x), and f ind(x) in 
O(l) expected time per operation. 

Furthermore, beginning with an empty LinearHashTable, any sequence of m add(x) 
and remove(x) operations results in a total ofO(m) time spent during all calls to resize(). 

5.2.3 Tabulation Hashing 

While analyzing the LinearHashTable structure, we made a very strong assumption: That 
for any set of elements, {x-|,..., x n }, the hash values hash(x 1 ), ...,hash(x n ) are independently 
and uniformly distributed over {0,...,t. length - 1}. One way to imagine getting this is to 
have a giant array, tab, of length 2 W , where each entry is a random w-bit integer, indepen- 
dent of all the other entries. In this way, we could implement hash(x) by extracting a d-bit 

integer from tab[x.hashCode()]: 
LinearHashTable 



int idealHash(T x) { 

return tab[x .hashCode( ) >>> w-d]; 
} 



Unfortunately, storing an array of size 2 W is prohibitive in terms of memory usage. 
The approach used by tabulation hashing is to, instead, treat w-bit integers as being com- 
prised of w/r integers, each having only r bits. In this way, tabulation hashing only needs 
w/r arrays each of length 2 r . All the entries in these arrays are independent w-bit inte- 
gers. To obtain the value of hash(x) we split x.hashCode() up into w/r r-bit integers and 
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use these as indices into these arrays. We then combine all these values with the bitwise 

exclusive-or operator to obtain hash(x). The following code shows how this works when 

w = 32 and r = 4: 
LinearHashTable 



int hash(T x) { 

int h = x . hashCode( ) ; 
return ( tab[0] [h&Oxff ] 

* tab[1][(h>»8)&0xff] 

* tab[2][(h>»16)&0xff] 

" tab[3][(h>»24)&0x££]) 
>>> (w-d); 
} 



In this case, tab is a 2-dimensional array with 4 columns and 2 32/4 = 256 rows. 

One can easily verify that, for any x, hash(x) is uniformly distributed over {0, . . . , 2 d - 
1}. With a little work, one can even verify that any pair of values have independent hash 
values. This implies tabulation hashing could be used in place of multiplicative hashing 
for the ChainedHashTable implementation. 

However, it is not true that any set of n distinct values gives a set of n independent 
hash values. Nevertheless, when tabulation hashing is used, the bound of Theorem 5.2 
still holds. References for this are provided at the end of this chapter. 

5.3 Hash Codes 

The hash tables discussed in the previous section are used to associate data with integer 
keys consisting of w bits. In many cases, we have keys that are not integers. They may be 
strings, objects, arrays, or other compound structures. To use hash tables for these types 
of data, we must map these data types to w-bit hash codes. Hash code mappings should 
have the following properties: 

1. If x and y are equal, then x.hashCode() and y.hashCode() are equal. 

2. If x and y are not equal, then the probability that x.hashCode() = y.hashCodeQ should 
be small (close to 1/2 W ). 

The first property ensures that if we store x in a hash table and later look up a 
value y equal to x, then we will find x — as we should. The second property minimizes the 
loss from converting our objects to integers. It ensures that unequal objects usually have 
different hash codes and so are likely to be stored at different locations in our hash table. 
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5.3.1 Hash Codes for Primitive Data Types 

Small primitive data types like char, byte, int, and float are usually easy to find hash 
codes for. These data types always have a binary representation and this binary represen- 
tation usually consists of w or fewer bits. (For example, in Java, byte is an 8-bit type and 
float is a 32-bit type.) In these cases, we just treat these bits as the representation of an 
integer in the range {0, ...,2 W - 1}. If two values are different, they get different hash codes. 
If they are the same, they get the same hash code. 

A few primitive data types are made up of more than w bits, usually cw bits for 
some constant integer c. (Java's long and double types are examples of this with c = 2.) 
These data types can be treated as compound objects made of c parts, as described in the 
next section. 

5.3.2 Hash Codes for Compound Objects 

For a compound object, we want to create a hash code by combining the individual hash 
codes of the object's constituent parts. This is not as easy as it sounds. Although one can 
find many hacks for this (for example, combining the hash codes with bitwise exclusive-or 
operations), many of these hacks turn out to be easy to foil (see Exercises 5.6-5.8). How- 
ever, if one is willing to do arithmetic with 2w bits of precision, then there are simple and 
robust methods available. Suppose we have an object made up of several parts Pq,...,P t _i 
whose hash codes are x , . . . , x r _i . Then we can choose mutually independent random w-bit 
integers z , ...,z r _ 1 and a random 2w-bit odd integer z and compute a hash code for our 
object with 

7 r-l \ \ 

T-Y Z i X i mod 22W div2W • 

> V !=0 / I 

Note that this hash code has a final step (multiplying by z and dividing by 2 W ) that uses 

the multiplicative hash function from Section 5.1.1 to take the 2w-bit intermediate result 

and reduce it to a w-bit final result. Here is an example of this method applied to a simple 

compound object with 3 parts xO, x1, and x2: 

Point3D 



h(x ,...,x r _ 1 ) = 



int hashCode( ) { 

long[] z = {0x2058cc50L, 0xcb19137el_, 0x2cb6b6fdl_} ; // random 
long zz = 0xbea0107e5067d19dl_; // random 

long hO = x0.hashCode( ) & ((1L«32)-1); // unsigned int to long 
long hi = x1.hashCode() & ( ( 1L«32)-1 ) ; 
long h2 = x2.hashCode( ) & ( ( 1I_«32)-1 ) ; 
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return (int) ( ( (z[0]*hO + z[1]*h1 + z[2]*h2)*zz) >» 32); 



The following theorem shows that, in addition to being straightforward to implement, this 
method is provably good: 

Theorem 5.3. Let x ,...,x r _ 1 and y , ...,y r _ 1 each be sequences o/w bit integers in {0, ...,2 W -1} 
and assume x,- * yj for at least one index i 6 {0, ...,r - 1}. T/zen 



Pr{/z(x ,...,x r _ 1 ) = % ---yr-l)}<3/2 w . 

Proof. We will first ignore the final multiplicative hashing step and see how that step con- 
tributes later. Define: 

I r-l N 



h'(xQ,...,x r _ x ) 



\i=o 



mod 2 2w . 



Suppose that h'(x ,...,x r _i) = h'(yQ,...,y r _i). We can rewrite this as: 



z;(x;-y,)mod 2 zw = f 



(5.4) 



where 



i-i 



r-l 



' = L Z i^i ~ x i )+ Z_ z i (y i " x / -) mod 2 ^ 

=0 ;=i + l 

If we assume, without loss of generality that x,- > y,-, then (5.4) becomes 



Zt(x»-yi) = * , 



(5.5) 



since each of z,- and (x^-y,) is at most 2 W -1, so their product is at most 2 2w -2 w+1 + l < 2 2w -l. 
By assumption, x,- - y,- * 0, so (5.5) has at most one solution in z,-. Therefore, since z,- and 
t are independent (z ,...,z r _i are mutually independent), the probability that we select z, 
so that ^ / (xo,...,x r _i) = h'(y ,...,y r _i) i s at most 1/2 W . 

The final step of the hash function is to apply multiplicative hashing to reduce 
our 2w-bit intermediate result h'(x , ..., x r _i) to a w-bit final result h(x ,...,x r _i). By Theo- 
rem 5.3, if h'{xQ,...,x r _\) * h'{y ,...,y r _x), then Pr{^(xo,...,x r _!) = h(y q, . . . ,y r _ x )} < 2/2 w . 
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To summarize, 

Pr ( h{xQ,...,x r . x ) ) 

\ = h(y Q ,...,y r _ 1 ) ) 

h'{x ,...,x r _ 1 ) = h'{y ,...,y r _ 1 )ot 
Pri h'(x ,...,x r _ 1 )*h'(y ,...,y r _ l ) 

and z/j / (x ,...,x r _ 1 )div2 w = zfr / (y ,...,y r _ 1 )div2 w 

< 1/2 W + 2/2 w = 3/2 w . □ 

5.3.3 Hash Codes for Arrays and Strings 

The method from the previous section works well for objects that have a fixed, constant, 
number of components. However, it breaks down when we want to use it with objects 
that have a variable number of components since it requires a random w-bit integer z, for 
each component. We could use a pseudorandom sequence to generate as many z,'s as we 
need, but then the z,'s are not mutually independent, and it becomes difficult to prove that 
the pseudorandom numbers don't interact badly with the hash function we are using. In 
particular, the values of t and z, in the proof of Theorem 5.3 are no longer independent. 

A more rigorous approach is to base our hash codes on polynomials over prime 
fields; these are just regular polynomials that are evaluated modulo some prime number, 
p. This method is based on the following theorem, which says that polynomials over prime 
fields behave pretty-much like usual polynomials: 

Theorem 5.4. Let p be a prime number, and let f(z) — x z° + x 1 z 1 + h x r _ 1 z r ~ l be a non- 
trivial polynomial with coefficients Xj 6 {0,...,p - 1}. Then the equation f(z) mod p = has at 
most r-\ solutions for z 6 {0, ...,p-l). 

To use Theorem 5.4, we hash a sequence of integers Xq, ...,x r _j with each x,- e 
{0, ...,p - 2} using a random integer z 6 {0,...,p- 1} via the formula 

^(x ,...,x f _!) = (x z° + --- + x^z^ 1 + (p- l)z r ) mod p . 

Note the extra (p - l)z r term at the end of the formula. It helps to think of (p - 1) 
as the last element, x,., in the sequence x ,...,x r . Note that this element differs from every 
other element in the sequence (each of which is in the set {0,...,p - 2}). We can think of 
p - 1 as an end-of-sequence marker. 

The following theorem, which considers the case of two sequences of the same 
length, shows that this hash function gives a good return for the small amount of random- 
ization needed to choose z: 
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Theorem 5.5. Let p > 2 W + 1 be a prime, let x ,...,x r _i and y §,..., y r -\ each be sequences of 
w-bit integers in {0, ...,2 W - 1}, and assume x,- * yifor at least one index i 6 {0,...,r- 1}. Then 

Pr{h(x ,...,Xr_i) = /j(y ,...,y r _i)} < (r- l)/p} . 

Proof. The equation /j(xQ,...,x r _i) = fr(yo>--->yr-l) can De rewritten as 

((xo-yo) z ° + --- + ( x r-i-y ) --i)z^ 1 )modp = 0. (5.6) 

Since x A * y 1; this polynomial is non-trivial. Therefore, by Theorem 5.4, it has at most r- 1 
solutions in z. The probability that we pick z to be one of these solutions is therefore at 
most (r- l)/p. □ 

Note that this hash function also deals with the case in which two sequences have 
different lengths, even when one of the sequences is a prefix of the other. This is because 
this function effectively hashes the infinite sequence 

x ,...,x r _ 1 ,p-l,0,0,... . 

This guarantees that if we have two sequences of length r and r' with r > r' , then these two 
sequences differ at index i - r. In this case, (5.6) becomes 

(i=r'-\ i-r-l \ 

) ( x ! -Yi) z ' + (x r '-p + l)z r + V x,z ! + (p- l)z r mod p = , 

i=0 i=r'+l , 

which, by Theorem 5.4, has at most r solutions in z. This combined with Theorem 5.5 
suffice to prove the following more general theorem: 

Theorem 5.6. Let p > 2 W + 1 be a prime, let xo,...,x r _i and yo/.--/Yr'-l be distinct sequences of 
w-bit integers in {0, . . . , 2 W - 1 }. Then 

Pr{/i(x , . . . , x r _x) = h(y , ..., y r _ : )} < maxjr, r'}/p . 

The following example code shows how this hash function is applied to an object 

that contains an array, x, of values: 

GeomVector 



int hashCode( ) { 

long p = (1L«32)-5; // prime: 2~32 - 5 

long z = 0x64b6055al_; // 32 bits from random.org 

int z2 = 0x5067d19d; // random odd 32 bit number 

long s = 0; 

long zi = 1 ; 
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for (int i = 0; i < x. length; i++) { 

long xi = (x[i] .hashCode( ) * z2) >>> 1; // reduce to 31 bits 
s = (s + zi * xi) % p; 
zi = (zi * z) % p; 

} 

s = (s + zi * (p-1 ) ) % p; 
return (int)s; 
} 



The above code sacrifices some collision probability for implementation conve- 
nience. In particular, it applies the multiplicative hash function from Section 5.1.1, with 
d = 31 to reduce x[i].hashCode() to a 31-bit value. This is so that the additions and mul- 
tiplications that are done modulo the prime p = 2 32 - 5 can be carried out using unsigned 
63-bit arithmetic. This means that the probability of two different sequences, the longer 
of which has length r, having the same hash code is at most 

2/2 31 + r/(2 32 -5) 

rather than the r/(2 32 - 5) specified in Theorem 5.6. 

5.4 Discussion and Exercises 

Hash tables and hash codes are an enormous and active area of research that is just touched 
upon in this chapter. The online Bibliography on Hashing [9] contains nearly 2000 entries. 

A variety of different hash table implementations exist. The one described in Sec- 
tion 5.1 is known as hashing with chaining (each array entry contains a chain (List) of 
elements). Hashing with chaining dates back to an internal IBM memorandum authored 
by H. P. Luhn and dated January 1953. This memorandum also seems to be one of the 
earliest references to linked lists. 

An alternative to hashing with chaining is that used by open addressing schemes, 
where all data is stored directly in an array. These schemes include the LinearHashTable 
structure of Section 5.2. This idea was also proposed, independently, by a group at IBM 
in the 1950s. Open addressing schemes must deal with the problem of collision resolution: 
the case where two values hash to the same array location. Different strategies exist for 
collision resolution and these provide different performance guarantees and often require 
more sophisticated hash functions than the ones described here. 

Yet another category of hash table implementations are the so-called perfect hashing 
methods. These are methods in which f ind(x) operations take 0(1) time in the worst-case. 
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For static data sets, this can be accomplished by finding perfect hash functions for the data; 
these are functions that map each piece of data to a unique array location. For data that 
changes over time, perfect hashing methods include FKS two-level hash tables [27, 22] and 
cuckoo hashing [55]. 

The hash functions presented in this chapter are probably among the most practi- 
cal currently known methods that can be proven to work well for any set of data. Other 
provably good methods date back to the pioneering work of Carter and Wegman who in- 
troduced the notion of universal hashing and described several hash functions for different 
scenarios [13]. Tabulation hashing, described in Section 5.2.3, is due to Carter and Weg- 
man [13], but its analysis, when applied to linear probing (and several other hash table 
schemes) is due to Patra§cu and Thorup [60]. 

The idea of multiplicative hashing is very old and seems to be part of the hashing 
folklore [46, Section 6.4]. However, the idea of choosing the multiplier z to be a random 
odd number, and the analysis in Section 5.1.1 is due to Dietzfelbinger et al. [21]. This 
version of multiplicative hashing is one of the simplest, but its collision probability of 2/2 d 
is a factor of 2 larger than what one could expect with a random function from 2 W — > 2 d . 
The multiply-add hashing method uses the function 

h(x) = ((zx + b) mod 2 2w )div2 2w ~ d 

where z and b are each randomly chosen from {0, ...,2 2w - 1}. Multiply-add hashing has a 
collision probability of only l/2 d [19], but requires 2w-bit precision arithmetic. 

There are a number of methods of obtaining hash codes from fixed-length se- 
quences of w-bit integers. One particularly fast method [10] is the function 

'r/2-l \ 

^(x ,...,x f _!) = ^((x 2 / + a 2 ;) mod 2 w )((x 2i+1 + a 2/+1 ) mod 2 W ) mod 2 2w 

V 1 = 

where r is even and a ,...,& r _i are randomly chosen from {0, ...,2 W ). This yields a 2w-bit 
hash code that has collision probability 1/2 W . This can be reduced to a w-bit hash code 
using multiplicative (or multiply-add) hashing. This method is fast because it requires 
only r/2 2w-bit multiplications whereas the method described in Section 5.3.2 requires r 
multiplications. (The mod operations occur implicitly by using w and 2w-bit arithmetic 
for the additions and multiplications, respectively.) 

The method from Section 5.3.3 of using polynomials over prime fields to hash 
variable-length arrays and strings is due to Dietzfelbinger et al. [20]. It is, unfortunately, 
not very fast. This is due to its use of the mod operator which relies on a costly machine 
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instruction. Some variants of this method choose the prime p to be one of the form 2 W -1, in 
which case the mod operator can be replaced with addition (+) and bitwise-and (&) opera- 
tions [45, Section 3.6]. Another option is to apply one of the fast methods for fixed-length 
strings to blocks of length c for some constant c> 1 and then apply the prime field method 
to the resulting sequence of \r/c] hash codes. 

Exercise 5.1. A certain university assigns its students student numbers the first time they 
register for any course. These numbers are sequential integers that started at many years 
ago and are now in the millions. Suppose we have a class of 100 first year students and 
we want to assign them hash codes based on their student numbers. Does it make more 
sense to use the first two digits or the last two digits of their student number? Justify your 
answer. 

Exercise 5.2. Prove that the bound 2/2 d in Lemma 5.1 is the best possible by showing 
that, if x = 2 w ~ d ~ 2 and y = 3x, then Pr{hash(x) = hash(y)} = 2/2 d . (Hint look at the binary 
representations of zx and z3x and use the fact that z3x = zx+2zx.) 

Exercise 5.3. Reprove Lemma 5.4 using the full version of Stirling's Approximation given 
in Section 1.2.2. 

Exercise 5.4. Consider the following the simplified version of the code for adding an ele- 
ment x to a LinearHashTable. This code simply stores x in the first null array entry it 
finds. Explain why this could be very slow by giving an example of a sequence of O(n) 
add(x), remove(x), and f ind(x) operations that would take on the order of n 2 time to exe- 
cute. 
LinearHashTable 



boolean addSlow(T x) { 

if (2*(q+1) > t. length) resize(); // max 50% occupancy 
int i = hash(x) ; 
while (t[i] != null) { 

if (t[i] != del && x . equals( t [i] ) ) return false; 

i = (i == t.length-1) ? : i + 1; // increment i (mod t. length) 

} 

t[i] = x; 
n++; q++; 
return true; 
} 



Exercise 5.5. Early versions of the Java hashCodeQ method for the String class worked by 
not using all characters of long strings. For example, for a 16 character string, the hash 
code was computed using only the 8 even-indexed characters. Explain why this was a very 
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bad idea. Give an example of large set of strings for which half the set would have the 
same hash code. 

Exercise 5.6. Suppose you have an object made up of two w-bit integers x and y. Show why 
x © y does not make a good hash code for your object. Give an example of a large set of 
objects that would all have hash code 0. 

Exercise 5.7. Suppose you have an object made up of two w-bit integers x and y. Show why 
x + y does not make a good hash code for your object. Give an example of a large set of 
objects that would all have the same hash code. 

Exercise 5.8. Suppose you have an object made up of two w-bit integers x and y. Suppose 
that the hash code for your object is defined by some deterministic function h(x,y). Prove 
that there exists a large set of objects that have the same hash code. 

Exercise 5.9. Let p = 2 W - 1 for some positive integer w. Explain why for a positive integer x 

(xmod2 w ) + (xdiv2 w ) = xmod(2 w -l) . 
(This gives an algorithm for computing x mod (2 W - 1) by repeatedly setting 

x = x&((1«w)- 1) + x>>>w 

until x < 2 W -1.) 

Exercise 5.10. Find some commonly-used hash table implementation (The Java Collection 
Framework HashMap or the HashTable or LinearHashTable implementations in this book, 
for example) and design a program that stores integers in this data structure so that there 
are integers, x, such that find(x) takes linear time. That is, find a set of n integers for 
which there are en elements that hash to the same table location. 

Depending on how good the implementation is, you may be able to do this just 
by inspecting the code for the implementation, or you may have to write some code that 
does trial insertions and searches, timing how long it takes to add and find particular 
values. (Exactly this kind of code can be used to launch denial of service attacks on internet 
servers.) 
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Binary Trees 



This chapter introduces one of the most fundamental structures in computer science: bi- 
nary trees. The use of the word tree here comes from the fact that, when we draw them, the 
resulting drawing often resembles the trees we find in a forest. There are lots of ways of 
defining binary trees. Mathematically a binary tree is a connected undirected finite graph 
with no cycles, and no vertex of degree greater than three. 

For most computer science applications, binary trees are rooted: A special node, 
r, of degree at most two is called the root of the tree. For every node, u#r, the second 
node on the path from u to r is called the parent of u. Each of the other nodes adjacent 
to u is called a child of u. Most of the binary trees we are interested in are ordered, so we 
distinguish between the left child and right child of u. 

In illustrations, binary trees are usually drawn from the root downward, with the 
root at the top of the drawing and the left and right children respectively given by left and 
right positions in the drawing (Figure 6.1). A binary tree with nine nodes is drawn this 
way in Figure 6. 2. a. 




u.left u. right 



Figure 6.1: The parent, left child, and right child of the node u in a BinaryTree. 
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6.1. BinaryTree: A Basic Binary Tree 





(a) 



(b) 



Figure 6.2: A binary tree with (a) nine real nodes and (b) ten external nodes. 

Binary trees are so important that a terminology has developed around them: The 
depth of a node, u, in a binary tree is the length of the path from u to the root of the tree. 
If a node, w, is on the path from u to r then w is called an ancestor of u and u a descendant 
of w. The subtree of a node, u, is the binary tree that is rooted at u and contains all of u's 
descendants. The height of a node, u, is the length of the longest path from u to one of its 
descendants. The height of a tree is the height of its root. A node, u, is a leaf if it has no 
children. 

We sometimes think of the tree as being augmented with external nodes. Any node 
that does not have a left child has an external node as its left child and any node that does 
not have a right child has an external node as its right child (see Figure 6.2.b). It is easy to 
verify, by induction, that a binary tree having n > 1 real nodes has n + 1 external nodes. 

6.1 BinaryTree: A Basic Binary Tree 

The simplest way to represent a node, u, in a binary tree is to store the (at most three) 
neighbours of u explicitly: 



class 


3TNode<Node 


ex 


tends 


BTNc 


ide<Node>> 


{ 


Node 


left; 














Node 


right; 














Node 


parent; 














} 

















When one of these three neighbours is not present, we set it to nil. In this way, external 
nodes in the tree as well as the parent of the root correspond to the value nil. 

The binary tree itself can then be represented by a reference to its root node, r: 
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BinaryTree 



Node r; 



We can compute the depth of a node, u, in a binary tree by counting the number of 
steps on the path from u to the root: 



int depth(Node u) { 


int d = 0; 


while (u != r) { 


u = u. parent; 


d++; 


} 


return d; 


} 



6.1.1 Recursive Algorithms 

It is very easy to compute facts about binary trees using recursive algorithms. For example, 
to compute the size of (number of nodes in) a binary tree rooted at node u, we recursively 
compute the sizes of the two subtrees rooted at the children of u, sum these sizes, and add 
one: 

BinaryTree 



int size(Node u) { 

if (u == nil) return 0; 

return 1 + size(u.left) + size(u .right ) ; 
} 



To compute the height of a node u we can compute the height of u's two subtrees, 
take the maximum, and add one: 

BinaryTree 



int height(Node u) { 

if (u == nil) return -1; 

return 1 + Math.max(height (u .lef t ) , height (u. right )) ; 
} 



6.1.2 Traversing Binary Trees 

The two algorithms from the previous section use recursion to visit all the nodes in a 
binary tree. Each of them visits the nodes of the binary tree in the same order as the 
following code: 
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BinaryTree 



void traverse (Node u) { 
if (u == nil) return; 
traverse(u.left ) ; 
traverse(u. right ) ; 

} 



Using recursion this way produces very short, simple code, but can be problematic. 
The maximum depth of the recursion is given by the maximum depth of a node in the 
binary tree, i.e., the tree's height. If the height of the tree is very large, then this could very 
well use more stack space than is available, causing a crash. 

Luckily, traversing a binary tree can be done without recursion. This is done using 
an algorithm that uses where it came from to decide where it will go next. See Figure 6.3. 
If we arrive at a node u from u. parent, then the next thing to do is to visit u.lef t. If we 
arrive at u from u.lef t, then the next thing to do is to visit u. right. If we arrive at u from 
u. right, then we are done visiting u's subtree, so we return to u. parent. The following 
code implements this idea, with code included for handling the cases where any of u.lef t, 
u. right, or u. parent is nil: 

BinaryTree 



void traverse2( ) { 

Node u = r, prev = nil, next; 
while (u != nil) { 

if (prev == u. parent) { 

if (u.left != nil) next = u.left; 
else if (u. right != nil) next = u. right; 
else next = u. parent; 
} else if (prev == u.left) { 

if (u. right != nil) next = u. right; 
else next = u. parent; 
} else { 

next = u. parent; 

} 

prev = u; 
u = next; 
} 
} 



The same things that can be computed with recursive algorithms can also be done 
this way. For example, to compute the size of the tree we keep a counter, n, and increment 
n whenever visiting a node for the first time: 
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u.left u. right 



Figure 6.3: The three cases that occur at node u when traversing a binary tree non- 
recursively, and the resulting traversal of the tree. 



BinaryTree 

int size2( ) { 

Node u = r, prev = nil, next; 

int n = 0; 

while (u != nil) { 

if (prev == u. parent) { 
n++; 

if (u.left != nil) next = u.left; 
else if (u. right != nil) next = u. right; 
else next = u. parent; 
} else if (prev == u.left) { 

if (u. right != nil) next = u. right; 
else next = u. parent; 
} else { 

next = u. parent; 

} 

prev = u; 

u = next; 

} 

return n; 



In some implementations of binary trees, the parent field is not used. When this is 
the case, a non-recursive implementation is still possible, but the implementation has to 
use a List (or Stack) to keep track of the path from the current node to the root. 

A special kind of traversal that does not fit the pattern of the above functions is 
the breadth-first traversal. In a breadth-first traversal, the nodes are visited level-by-level 
starting at the root and working our way down, visiting the nodes at each level from left 
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Figure 6.4: During a breadth-first traversal, the nodes of a binary tree are visited level-by- 
level, and left-to-right within each level. 

to right. This is similar to the way we would read a page of English text. (See Figure 6.4.) 

This is implemented using a queue, q, that initially contains only the root, r. At each step, 

we extract the next node, u, from q, process u and add u.lef t and u. right (if they are 

non-nil) to q: 

BinaryTree 



void bfTraverse() { 

Queue<Node> q = new Linkedl_ist<Node>( ) ; 
if (r != nil) q.add(r) ; 
while ( !q. isEmpty ( ) ) { 
Node u = q.remove(); 
if (u.left != nil) q.add(u .lef t ) ; 
if (u. right != nil) q.add(u .right ) ; 
} 
} 



6.2 BinarySearchTree: An Unbalanced Binary Search Tree 

A BinarySearchTree is a special kind of binary tree in which each node, u, also stores a 
data value, u.x, from some total order. The data values in a binary search tree obey the 
binary search tree property: For a node, u, every data value stored in the subtree rooted 
at u.left is less than u.x and every data value stored in the subtree rooted at u. right is 
greater than u.x. An example of a BinarySearchTree is shown in Figure 6.5. 

6.2.1 Searching 

The binary search tree property is extremely useful because it allows us to quickly locate 
a value, x, in a binary search tree. To do this we start searching for x at the root, r. When 
examining a node, u, there are three cases: 
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Figure 6.5: A binary search tree. 

1. If x < u.x then the search proceeds to u.left; 

2. If x > u.x then the search proceeds to u. right; 

3. If x = u.x then we have found the node u containing x. 

The search terminates when Case 3 occurs or when u = nil. In the former case, we found 

x. In the latter case, we conclude that x is not in the binary search tree. 

BinarySearchTree 



T findEQ(T x ) { 
Node u = r; 
while (u != nil) { 

int comp = compare(x, u.x); 
if (comp < 0) 
u = u.left; 
else if (comp > 0) 

u = u. right; 
else 

return u.x; 

} 

return null; 

} 



Two examples of searches in a binary search tree are shown in Figure 6.6. As the 
second example shows, even if we don't find x in the tree, we still gain some valuable 
information. If we look at the last node, u, at which Case 1 occurred, we see that u.x is the 
smallest value in the tree that is greater than x. Similarly the last node at which Case 2 
occurred contains the largest value in the tree that is less than x. Therefore, by keeping 
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(a) 



(b) 



Figure 6.6: An example of (a) a successful search (for 6) and (b) an unsuccessful search 
(for 10) in a binary search tree. 



track of the last node, z, at which Case 1 occurs, a BinarySearchTree can implement the 

f ind(x) operation that returns the smallest value stored in the tree that is greater than or 

equal to x: 
BinarySearchTree 



T find(T x) { 

Node w = r, z = nil; 
while (w != nil) { 

int comp = compare(x, w.x); 
if (comp < 0) { 
z = w; 
w = w.left; 
} else if (comp > 0) { 

w = w. right; 
} else { 

return w.x; 
} 
} 
return z == nil ? null : z.x; 



6.2.2 Addition 

To add a new value, x, to a BinarySearchTree, we first search for x. If we find it, then there 
is no need to insert it. Otherwise, we store x at a leaf child of the last node, p, encountered 
during the search for x. Whether the new node is the left or right child of p depends on 
the result of comparing x and p.x. 
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BinarySearchTree 



boolean add(T x) { 

Node p = f indLast (x) ; 

return addChild(p, newNode(x)); 
} 



BinarySearchTree 



Node findl_ast(T x) { 
Node w = r, prev = nil; 
while (w ! = nil) { 
prev = w; 

int comp = compare(x, w.x); 
if (comp < 0) { 

w = w.left; 
} else if (comp > 0) { 

w = w. right; 
} else { 

return w; 
} 
} 
return prev; 

} 



BinarySearchTree 

boolean addChild(Node p, Node u) { 
if (p == nil) { 

r = u; // inserting into empty tree 

} else { 

int comp = compare(u.x, p.x); 
if (comp < 0) { 

p. left = u; 
} else if (comp > 0) { 

p. right = u; 
} else { 

return false; // u.x is already in the tree 

} 

u. parent = p; 

} 

n++; 

return true; 
} 



An example is shown in Figure 6.7. The most time-consuming part of this process is the 
initial search for x, which takes time proportional to the height of the newly added node 
u. In the worst case, this is equal to the height of the BinarySearchTree. 
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Figure 6.7: Inserting the value 8.5 into a binary search tree. 



6.2.3 Removal 



Deleting a value stored in a node, u, of a BinarySearchTree is a little more difficult. If u is 
a leaf, then we can just detach u from its parent. Even better: If u has only one child, then 
we can splice u from the tree by having u. parent adopt u's child (see Figure 6.8): 

_ BinarySearchTree 



void splice(Node u) { 
Node s, p; 
if (u.left != nil) { 

s = u.left; 
} else { 

s = u. right; 

} 

if (u == r) { 

r = s; 

p = nil; 
} else { 

p = u. parent; 

if (p. left == u) { 
p. left = s; 

} else { 

p. right = s; 

} 
} 
if (s != nil) { 

s. parent = p; 

} 
n--; 

} 
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Figure 6.8: Removing a leaf (6) or a node with only one child (9) is easy. 

'j) (T 




Figure 6.9: Deleting a value (11) from a node, u, with two children is done by replacing u's 
value with the smallest value in the right subtree of u. 



Things get tricky though, when u has two children. In this case, the simplest thing 
to do is to find a node, w, that has less than two children such that we can replace u.x with 
w.x. To maintain the binary search tree property, the value w.x should be close to the value 
of u.x. For example, picking w such that w.x is the smallest value greater than u.x will do. 
Finding the node w is easy; it is the smallest value in the subtree rooted at u. right. This 
node can be easily removed because it has no left child. (See Figure 6.9) 

BinarySearchTree 



void remove (Node u) { 
if (u.left == nil | 

splice(u) ; 
} else { 

Node w = u. right; 
while (w.left != nil) 

w = w.left; 
u.x = w.x; 
splice(w) ; 



} 



u. right == nil) { 



} 
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6.2.4 Summary 

The f ind(x), add(x), and remove(x) operations in a BinarySearchTree each involve follow- 
ing a path from the root of the tree to some node in the tree. Without knowing more about 
the shape of the tree it is difficult to say much about the length of this path, except that 
it is less than n, the number of nodes in the tree. The following (unimpressive) theorem 
summarizes the performance of the BinarySearchTree data structure: 

Theorem 6.1. A BinarySearchTree implements the SSet interface. A BinarySearchTree 

supports the operations add(x), remove(x), and find(x) in O(n) time per operation. 

Theorem 6.1 compares poorly with Theorem 4.1, which shows that the SkiplistS- 
Set structure can implement the SSet interface with 0(log n) expected time per operation. 
The problem with the BinarySearchTree structure is that it can become unbalanced. In- 
stead of looking like the tree in Figure 6.5 it can look like a long chain of n nodes, all but 
the last having exactly one child. 

There are a number of ways of avoiding unbalanced binary search trees, all of 
which lead to data structures that have O(logn) time operations. In Chapter 7 we show 
how O(logn) expected time operations can be achieved with randomization. In Chapter 8 
we show how O(logn) amortized time operations can be achieved with partial rebuilding 
operations. In Chapter 9 we show how O(logn) worst-case time operations can be achieved 
by simulating a tree that is not binary: a tree in which nodes can have up to four children. 

6.3 Discussion and Exercises 

Binary trees have been used to model relationships for literally thousands of years. One 
reason for this is that binary trees naturally model (pedigree) family trees. These are the 
family trees in which the root is a person, the left and right children are the person's 
parents, and so on, recursively. In more recent centuries binary trees have also been used 
to model species-trees in biology, where the leaves of the tree represent extant species and 
the internal nodes of the tree represent speciation events in which two populations of a 
single species evolve into two separate species. 

Binary search trees appear to have been discovered independently by several groups 
in the 1950s [46, Section 6.2.2]. Further references to specific kinds of binary search trees 
are provided in subsequent chapters. 

Exercise 6.1. Prove that a binary tree having n > 1 real nodes has n + 1 external nodes. 
Exercise 6.2. Prove that, if a binary tree, T, has at least one leaf then, either (a) T's root has 
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at most one child or (b) T has greater than one leaf. 

Exercise 6.3. Write a non-recursive variant of the size2() method, size(u), that computes 
the size of the subtree rooted at node u. 

Exercise 6.4. Write a non-recursive method, height2(u), that computes the height of node 
u in a binary search tree. 

Exercise 6.5. Design and implement a method BinarySearchTree method getLE(x), that 
returns a list of all items in the tree that are less than or equal to x. The running time of 
your method should be 0(n' + h) where n' is the number of items less than or equal to x 
and h is the height of the tree. 

Exercise 6.6. A binary tree is balanced if, for every node u, the size of the subtrees rooted 
at u.lef t and u. right differ by at most one. Write a recursive method, isBalanced(), that 
tests if a binary tree is balanced. Your method should run in O(n) time. (Be sure to test 
your code on some large trees with different shapes; it is easy to write a method that takes 
much longer than O(n) time.) 

A pre-order traversal of a binary tree is a traversal that visits each node, u, before 
any of its children. An in-order traversal visits u after visiting all the nodes in u's left 
subtree but before visiting any of the nodes in u's right subtree. A post-order traversal 
visits u only after visiting all other nodes in u's subtree. The pre/in/post-order numbering 
of a tree labels the nodes of a tree with the integers 0,...,n- 1 in the order that they are 
encountered by a pre/in/post-order traversal. See Figure 6.10 for an example. 

Exercise 6.7. Create a subclass of BinaryTree whose nodes have fields for storing pre- 
order, post-order, and in-order numbers. Write recursive methods preOrderl\lumber(), 
inOrderl\lumber(), and postOrderNumbers() that assign these numbers correctly. These 
methods should each run in O(n) time. 

Exercise 6.8. Write non-recursive functions nextPreOrder(u), nextlnOrder(u), and nextPostOrder(u) 
that return the node that follows u in a pre-order, in-order, or post-order traversal, respec- 
tively. These functions should take amortized constant time. That is, if we start at any 
node u and repeatedly call one of these functions and assign the return value to u until 
u=null, then the cost of all these calls should be O(n). 

Exercise 6.9. Suppose we are given a binary tree with pre- post- and in-order numbers as- 
signed to the nodes. Show how these numbers can be used to answer each of the following 
questions in constant time: 

1. Given a node u, determine the size of the subtree rooted at u. 
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Figure 6.10: Pre-order, post-order, and in-order numberings of a binary tree. 

2. Given a node u, determine the depth of u. 

3. Given two nodes u and w, determine if u is an ancestor of w 

Exercise 6.10. Suppose you are given a list of nodes with pre-order and in-order num- 
bers assigned. Prove that there is at most one possible tree with this pre-order/in-order 
numbering and show how to construct it. 

Exercise 6.1 1. Show that the shape of any binary tree on n nodes can be represented using 
at most 2(n - 1) bits. (Hint: think about recording what happens during a traversal and 
then playing back that recording to reconstruct the tree.) 

Exercise 6.12. Describe a sequence of n operations on an initially empty BinarySearchTree 
that results in a tree of height n - 1 . How many such sequences are there? 

Exercise 6.13. If we have some BinarySearchTree and perform the operations add(x) fol- 
lowed by remove(x) (with the same value of x) do we necessarily return to the original 
tree? 

Exercise 6.14. Can a remove(x) operation increase the height of any node in a Binary- 
SearchTree? If so, by how much? 

Exercise 6.15. Can an add(x) operation increase the height of any node in a BinarySearch- 
Tree? Can it increase the height of the tree? If so, by how much? 



126 



6. Binary Trees 6.3. Discussion and Exercises 

Exercise 6.16. Design and implement a version of BinarySearchTree in which each node, 
u, maintains values u.size (the size of the subtree rooted at u), u. depth (the depth of u), 
and u. height (the height of the subtree rooted at u). 

These values should be maintained, even during the add(x) and remove(x) opera- 
tions, but this should not increase the cost of these operations by more than a constant 
factor. 
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Chapter 7 



Random Binary Search Trees 



In this chapter, we present a binary search tree structure that uses randomization to 
achieve O(logn) expected time for all operations. 

7.1 Random Binary Search Trees 

Consider the two binary search trees shown in Figure 7.1. The one on the left is a list and 
the other is a perfectly balanced binary search tree. The one on the left has height n-1 = 14 
and the one on the right has height three. 

Imagine how these two trees could have been constructed. The one on the left 
occurs if we start with an empty BinarySearchTree and add the sequence 

< 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1 0, 1 1, 1 2, 1 3, 1 4) . 

No other sequence of additions will create this tree (as you can prove by induction on n). 
On the other hand, the tree on the right can be created by the sequence 

(7,3,11,1,5,9,13,0,2, 4, 6, 8, 1 0, 1 2, 1 4) . 

Other sequences work as well, including 

(7, 3, 1 , 5, 0, 2, 4, 6, 1 1, 9, 1 3, 8, 1 0, 1 2, 1 4) , 

and 

< 7, 3, 1, 1 1, 5, 0, 2, 4, 6, 9, 1 3, 8, 1 0, 1 2, 1 4) . 

In fact, there are 21,964,800 addition sequences that generate the tree on the right and 
only one that generates the tree on the left. 

The above example gives some anecdotal evidence that, if we choose a random 
permutation of 0, . . . , 1 4, and add it into a binary search tree then we are more likely to get 
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Figure 7.1: Two binary search trees containing the integers 0, ..., 14. 

a very balanced tree (the right side of Figure 7.1) than we are to get a very unbalanced tree 
(the left side of Figure 7.1). 

We can formalize this notion by studying random binary search trees. A random 
binary search tree of size n is obtained in the following way: Take a random permutation, 
Xo,...,x n _i, of the integers 0,...,n - 1 and add its elements, one by one, into a Binary- 
SearchTree. By random permutation we mean that each of the possible n! permutations 
(orderings) of 0, . . . , n - 1 is equally likely, so that the probability of obtaining any particu- 
lar permutation is 1/n!. 

Note that the values 0, ...,n - 1 could be replaced by any ordered set of n elements 
without changing any of the properties of the random binary search tree. The element 
x 6 {0, ...,n -1} is simply standing in for the element of rank x in an ordered set of size n. 

Before we can present our main result about random binary search trees, we must 
take some time for a short digression to discuss a type of number that comes up frequently 
when studying randomized structures. For a non-negative integer, k, the fc-th harmonic 
number, denoted H k , is defined as 



H k = 1 + 1/2 + 1/3 ■ 



Ilk . 



The harmonic number H k has no simple closed form, but it is very closely related to the 
natural logarithm of k. In particular, 

lnk<H k <lnfc + l . 

rk 

Readers who have studied calculus might notice that this is because the integral J ( 1/x) dx - 
Ink. Keeping in mind that an integral can be interpreted as the area between a curve and 
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12 3 



Figure 7.2: The A:th harmonic number H^ = H, =1 1/z is upper-bounded by 1 + J (l/x)dx 
and lower-bounded by [ ( 1/x) dx. The value of f ( 1/x) dx is given by the area of the shaded 
region, while the value of H k is given by the area of the rectangles. 

rk 

the x-axis, the value of H k can be lower-bounded by the integral J (l/x)dx and upper- 

rk 

bounded by 1 + J (l/x)dx. (See Figure 7.2 for a graphical explanation.) 
Lemma 7.1. In a random binary search tree of size n, the following statements hold: 

1. For any x 6 {0,...,n-l}, the expected length of the search path for x is H x+1 +H n _ x -0(1). 1 

2. For any xe(-l,n)\{0,...,n-l|, the expected length of the search path for x is H|- X -|+H n _r x -|. 

We will prove Lemma 7.1 in the next section. For now, consider what the two parts 
of Lemma 7.1 tell us. The first part tells us that if we search for an element in a tree of size 
n, then the expected length of the search path is at most 2 In n + 0( 1 ). The second part tells 
us the same thing about searching for a value not stored in the tree. When we compare the 
two parts of the lemma, we see that it is only slightly faster to search for something that is 
in the tree compared to something that is not in the tree. 

7.1.1 Proof of Lemma 7.1 

The key observation needed to prove Lemma 7.1 is the following: The search path for a 
value x in the open interval (-l,n) in a random binary search tree, T, contains the node 
with key i < x if and only if, in the random permutation used to create T, i appears before 
any of {/+ l,i + 2,...,|_xJ}. 



The expressions x + 1 and n - x can be interpreted respectively as the number of elements in the tree less 
than or equal to x and the number of elements in the tree greater than or equal to x. 
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Figure 7.3: The value i < x is on the search path for x if and only if i is the first element 
among [i,i + l,...,|_x_|} added to the tree. 

To see this, refer to Figure 7.3 and notice that, until some value in \i,i + l,...,[xj} 
is added, the search paths for each value in the open interval (i - 1,|_ X J + 1) are identical. 
(Remember that for two search values to have different search paths, there must be some 
element in the tree that compares differently with them.) Let j be the first element in 
\i, i + 1, . . ., |_xj } to appear in the random permutation. Notice that j is now and will always 
be on the search path for x. If j * i then the node u; containing j is created before the node 
u,- that contains i. Later, when i is added, it will be added to the subtree rooted at u.-.lef t, 
since i < j. On the other hand, the search path for x will never visit this subtree because it 
will proceed to uy.right after visiting uy. 

Similarly, for i > x, i appears in the search path for x if and only if i appears before 
any of {[x],[x] + \,...,i - 1} in the random permutation used to create T. 

Notice that, if we start with a random permutation of {0, ...,n}, then the subse- 
quences containing only \i,i + I,..., |_x J} and {[x],[x] + \,...,i -\) are also random permu- 
tations of their respective elements. Each element, then, in the subsets {z,i + l,...,[xj} and 
{[x], [x] + l,...,z- 1} is equally likely to appear before any other in its subset in the random 
permutation used to create T . So we have 

( l/(|_xj-z' + l) if i <x 
Pr{/ is on the search path for x} = { 

l/(z-rxl + l) if f > x 
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With this observation, the proof of Lemma 7.1 involves some simple calculations 
with harmonic numbers: 

Proof of Lemma 7.1. Let 7/ be the indicator random variable that is equal to one when i 
appears on the search path for x and zero otherwise. Then the length of the search path is 
given by 

L '■ 

ie{0 n-l)\(x) 

so, if x 6 {0,...,n-l}, the expected length of the search path is given by (see Figure 7.4.a) 



x-l n-1 



i=0 l' = X + l 



x-l n-1 



j=0 i=x+l 

x-l n-1 

:£l/(|xj-i+l) + ^ l/(f-rxl + l) 



1=0 !=X+1 

x-l n-1 

= ^l/(x-* + l)+ J^ l/(*-x + l) 

i=o ;=x+i 

1 1 1 



2 3 x + 1 

1 1 1 



2 3 n - x 

= #x+l + H n _„ - 2 • 

The corresponding calculations for a search value x 6 (-1, n) \ {0, . . . , n - 1 } are almost iden- 
tical (see Figure 7.4.b). □ 

7.1.2 Summary 

The following theorem summarizes the performance of a random binary search tree: 

Theorem 7.1. A random binary search tree can be constructed in O(nlogn) time. In a random 
binary search tree, the f lnd(x) operation takes O(logn) expected time. 

We should emphasize again that the expectation in Theorem 7.1 is with respect to 
the random permutation used to create the random binary search tree. In particular, it 
does not depend on a random choice of x; it is true for every value of x. 

7.2 Treap: A Randomized Binary Search Tree 

The problem with random binary search trees is, of course, that they are not dynamic. 
They don't support the add(x) or remove(x) operations needed to implement the SSet in- 
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Figure 7.4: The probabilities of an element being on the search path for x when (a) x is an 
integer and (b) when x is not an integer. 

terface. In this section we describe a data structure called a Treap that uses Lemma 7.1 to 
implement the SSet interface. 2 

A node in a Treap is like a node in a BinarySearchTree in that it has a data value, 
x, but it also contains a unique numerical priority, p, that is assigned at random: 
Treap 



class Node<T> extends BinarySearchTree. BSTNode<Node<T>,T> { 

int p; 
} 



In addition to being a binary search tree, the nodes in a Treap also obey the heap property: 

• (Heap Property) At every node u, except the root, u. parent. p < u.p. 

In other words, each node has a priority smaller than that of its two children. An example 
is shown in Figure 7.5. 

The heap and binary search tree conditions together ensure that, once the key (x) 
and priority (p) for each node are defined, the shape of the Treap is completely determined. 
The heap property tells us that the node with minimum priority has to be the root, r, of 
the Treap. The binary search tree property tells us that all nodes with keys smaller than 
r.x are stored in the subtree rooted at r.lef t and all nodes with keys larger than r.x are 
stored in the subtree rooted at r.right. 



The names Treap comes from the fact that this data structure is simultaneously a binary search tree 
(Section 6.2) and a heap (Chapter 10). 
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Figure 7.5: An example of a Treap containing the integers 0, ...,9. Each node, u, is illus- 
trated as a box containing u.x, u.p. 

The important point about the priority values in a Treap is that they are unique 
and assigned at random. Because of this, there are two equivalent ways we can think 
about a Treap. As defined above, a Treap obeys the heap and binary search tree properties. 
Alternatively we can think of a Treap as a BinarySearchTree whose nodes were added 
in increasing order of priority. For example, the Treap in Figure 7.5 can be obtained by 
adding the sequence of (x, p) values 

<(3,1),(1,6),(0,9),(5,11),(4,14),(9,17),(7,22),(6,42),(8,49),(2,99)) 

into a BinarySearchTree. 

Since the priorities are chosen randomly, this is equivalent to taking a random 
permutation of the keys — in this case the permutation is 

(3,1,0,5,9,4,7,6,8,2) 

— and adding these to a BinarySearchTree. But this means that the shape of a treap is 
identical to that of a random binary search tree. In particular, if we replace each key x by 
its rank, 3 then Lemma 7.1 applies. Restating Lemma 7.1 in terms of Treaps, we have: 

Lemma 7.2. In a Treap that stores a set Sofn keys, the following statements hold: 

1. For any x 6 S, the expected length of the search path for x is H r ^ +1 + H n _ r ^ - 0(1). 

2. For any x <£. S, the expected length of the search path for x is H r i x \ + H n _ r ( x ). 



The rank of an element x in a set S of elements is the number of elements in S that are less than x. 
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rotateRight(u) => 
<= rotateLeft(w) 




Figure 7.6: Left and right rotations in a binary search tree. 



Here, r(x) denotes the rank of x in the set S U {x}. 



Again, we emphasize that the expectation in Lemma 7.2 is taken over the random 
choices of the priorities for each node. It does not require any assumptions about the 
randomness in the keys. 

Lemma 7.2 tells us that Treaps can implement the find(x) operation efficiently 
However, the real benefit of a Treap is that it can support the add(x) and delete(x) oper- 
ations. To do this, it needs to perform rotations in order to maintain the heap property. 
Refer to Figure 7.6. A rotation in a binary search tree is a local modification that takes a 
parent u of a node w and makes w the parent of u, while preserving the binary search tree 
property. Rotations come in two flavours: left or right depending on whether w is a right or 
left child of u, respectively. 

The code that implements this has to handle these two possibilities and be careful 

of a boundary case (when u is the root) so the actual code is a little longer than Figure 7.6 

would lead a reader to believe: 

BinarySearchTree 



void rotatel_eft(Node u) { 
Node w = u. right; 
w. parent = u. parent; 
if (w. parent != nil) { 

if (w. parent .left == u) { 

w. parent .left = w; 
} else { 

w. parent .right = w; 
} 
} 
u. right = w.left; 
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if (u. right != nil) { 






u. right .parent = u; 






} 






u. parent = w; 






w.left = u; 






if (u == r) { r = w; r. parent = 


= nil; 


} 


} 






void rotateRight(Node u) { 






Node w = u.left; 






w. parent = u. parent; 






if (w. parent != nil) { 






if (w. parent .left == u) { 






w. parent . left = w; 






} else { 






w. parent .right = w; 






} 






} 






u.left = w. right; 






if (u.left != nil) { 






u.left .parent = u; 






} 






u. parent = w; 






w. right = u; 






if (u == r) { r = w; r. parent = 


= nil; 


} 


} 







In terms of the Treap data structure, the most important property of a rotation is that the 
depth of w decreases by one while the depth of u increases by one. 

Using rotations, we can implement the add(x) operation as follows: We create a 
new node, u, and assign u.x = x and pick a random value for u.p. Next we add u using the 
usual add(x) algorithm for a BinarySearchTree, so that u is now a leaf of the Treap. At 
this point, our Treap satisfies the binary search tree property, but not necessarily the heap 
property. In particular, it may be the case that u. parent. p > u.p. If this is the case, then we 
perform a rotation at node w=u. parent so that u becomes the parent of w. If u continues 
to violate the heap property we will have to repeat this, decreasing u's depth by one every 

time, until u either becomes the root or u. parent. p < u.p. 

Treap 



boolean add(T x) { 

Node<T> u = newNode(); 
u.x = x; 

u.p = rand. nextlnt ( ) ; 
if (super. add(u) ) { 
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bubbleUp(u) ; 
return true; 

} 

return false; 

} 

void bubbleUp(Node<T> u) { 

while (u. parent != nil && u. parent. p > u.p) { 
if (u. parent .right == u) { 

rotatel_eft(u. parent) ; 
} else { 

rotateRight(u. parent) ; 
} 
} 
if (u. parent == nil) { 

r = u; 
} 
} 



An example of an add(x) operation is shown in Figure 7 .7 . 

The running time of the add(x) operation is given by the time it takes to follow 
the search path for x plus the number of rotations performed to move the newly-added 
node, u, up to its correct location in the Treap. By Lemma 7.2, the expected length of the 
search path is at most 2 In n + 0(1). Furthermore, each rotation decreases the depth of u. 
This stops if u becomes the root, so the expected number of rotations cannot exceed the 
expected length of the search path. Therefore, the expected running time of the add(x) 
operation in a Treap is O(logn). (Exercise 7.4 asks you to show that the expected number 
of rotations performed during an addition is actually only 0(1).) 

The remove(x) operation in a Treap is the opposite of the add(x) operation. We 
search for the node, u, containing x and then perform rotations to move u downwards until 
it becomes a leaf and then we splice u from the Treap. Notice that, to move u downwards, 
we can perform either a left or right rotation at u, which will replace u with u. right or 
u.lef t, respectively. The choice is made by the first of the following that apply: 

1. If u.lef t and u. right are both null, then u is a leaf and no rotation is performed. 

2. If u.lef t (or u. right) is null, then perform a right (or left, respectively) rotation at 
u. 

3. If u.left.p < u. right. p (or u.left.p > u. right. p), then perform a right rotation (or 
left rotation, respectively) at u. 
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Figure 7.7: Adding the value 1.5 into the Treap from Figure 7.5. 
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These three rules ensure that the Treap doesn't become disconnected and that the heap 
property is restored once u is removed. 

Treap 



boolean remove(T x) { 

Node<T> u = findLast(x) ; 

if (u != nil && compare(u . x, x) == 0) { 

trickleDown(u) ; 

splice(u) ; 

return true; 

} 

return false; 

} 

void trickleDown(Node<T> u) { 

while (u.left != nil | | u. right != nil) { 
if (u.left == nil) { 

rotateLeft(u) ; 
} else if (u. right == nil) { 

rotateRight(u) ; 
} else if (u.left.p < u. right. p) { 

rotateRight (u) ; 
} else { 

rotateLeft (u) ; 

} 

if (r == u) { 

r = u. parent; 
} 



} 



} 



An example of the remove(x) operation is shown in Figure 7.8. 

The trick to analyze the running time of the remove(x) operation is to notice that 
this operation is the reverse of the add(x) operation. In particular, if we were to reinsert x, 
using the same priority u.p, then the add(x) operation would do exactly the same number 
of rotations and would restore the Treap to exactly the same state it was in before the 
remove(x) operation took place. (Reading from bottom-to-top, Figure 7.8 illustrates the 
addition of the value 9 into a Treap.) This means that the expected running time of the 
remove(x) on a Treap of size n is proportional to the expected running time of the add(x) 
operation on a Treap of size n-1. We conclude that the expected running time of remove(x) 
is O(logn). 
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Figure 7.8: Removing the value 9 from the Treap in Figure 7.5. 
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7.2.1 Summary 

The following theorem summarizes the performance of the Treap data structure: 

Theorem 7.2. A Treap implements the SSet interface. A Treap supports the operations add(x), 
remove(x), and find(x) in O(logn) expected time per operation. 

It is worth comparing the Treap data structure to the SkiplistSSet data structure. 
Both implement the SSet operations in O(logn) expected time per operation. In both data 
structures, add(x) and remove(x) involve a search and then a constant number of pointer 
changes (see Exercise 7.4 below). Thus, for both these structures, the expected length of 
the search path is the critical value in assessing their performance. In a SkiplistSSet, the 
expected length of a search path is 

21ogn + 0(l) , 

In a Treap, the expected length of a search path is 

21nn + 0(l)«1.3861ogn + 0(l) . 

Thus, the search paths in a Treap are considerably shorter and this translates into notice- 
ably faster operations on Treaps than Skiplists. Exercise 4.2 in Chapter 4 shows how the 
expected length of the search path in a Skiplist can be reduced to 

elnn + 0(l)«1.8841ogn + 0(l) 

by using biased coin tosses. Even with this optimization, the expected length of search 
paths in a SkiplistSSet is noticeably longer than in a Treap. 

7.3 Discussion and Exercises 

Random binary search trees have been studied extensively. Devroye [17] gives a proof of 
Lemma 7.1 and related results. There are much stronger results in the literature as well. 
The most impressive of which is due to Reed [62], who shows that the expected height of 
a random binary search tree is 

a In n - /S lnln n + 0( 1 ) 

where a « 4.31107 is the unique solution on [2,oo) of the equation a\n([2e/a)) - 1 and 
/S = 2in(a/2) " Furthermore, the variance of the height is constant. 
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The name Treap was coined by Aragon and Seidel [65] who discussed Treaps 
and some of their variants. However, their basic structure was studied much earlier by 
Vuillemin [71] who called them Cartesian trees. 

One space-optimization of the Treap data structure that is sometimes performed is 
the elimination of the explicit storage of the priority p in each node. Instead, the priority of 
a node, u, is computed by hashing u's address in memory (in 32-bit Java, this is equivalent 
to hashing u.hashCodeQ). Although a number of hash functions will probably work well 
for this in practice, for the important parts of the proof of Lemma 7.1 to remain valid, the 
hash function should be randomized and have the min-wise independent property: For any 
distinct values X\,..., x^, each of the hash values h{x.\ ),..., h{x\) should be distinct with high 
probability and, for each i e{l,...,k}, 

Pr{/?(xj) = mm.{h{x\), . . . ,h(xi)}} < elk 

for some constant c. One such class of hash functions that is easy to implement and fairly 
fast is tabulation hashing (Section 5.2.3). 

Another Treap variant that doesn't store priorities at each node is the randomized 
binarysearch tree of Martinez and Roura [49]. In this variant, every node, u, stores the 
size, u.size, of the subtree rooted at u. Both the add(x) and remove(x) algorithms are 
randomized. The algorithm for adding x to the subtree rooted at u does the following: 

1. With probability l/(size(u)+ 1), x is added the usual way, as a leaf, and rotations are 
then done to bring x up to the root of this subtree. 

2. Otherwise, x is recursively added into one of the two subtrees rooted at u.lef t or 
u. right, as appropriate. 

The first case corresponds to an add(x) operation in a Treap where x's node receives a 
random priority that is smaller than any of the size(u) priorities in u's subtree, and this 
case occurs with exactly the same probability. 

Removing a value x from a randomized binary search tree is similar to the process 
of removing from a Treap. We find the node, u, that contains x and then perform rotations 
that repeatedly increase the depth of u until it becomes a leaf, at which point we can splice 
it from the tree. The choice of whether to perform a left or right rotation at each step is 
randomized. 

1. With probability u.lef t.size/(u. size - 1), we perform a right rotation at u, making 
u.lef t the root of the subtree that was formerly rooted at u. 
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2. With probability u. right. size/(u. size - 1), we perform a left rotation at u, making 
u.right the root of the subtree that was formerly rooted at u. 

Again, we can easily verify that these are exactly the same probabilities that the removal 
algorithm in a Treap will perform a left or right rotation of u. 

Randomized binary search trees have the disadvantage, compared to treaps, that 
when adding and removing elements they make many random choices and they must 
maintain the sizes of subtrees. One advantage of randomized binary search trees over 
treaps is that subtree sizes can serve another useful purpose, namely to provide access by 
rank in O(logn) expected time (see Exercise 7.7). In comparison, the random priorities 
stored in treap nodes have no use other than keeping the treap balanced. 

Exercise 7.1. Prove the assertion that there are 21,964,800 sequences that generate the 
tree on the right hand side of Figure 7.1. (Hint: Give a recursive formula for the number 
of sequences that generate a complete binary tree of height h and evaluate this formula for 
h = 3.) 

Exercise 7.2. Design and implement the permute(a) method that takes as input an array, 
a, containing n distinct values and randomly permutes a. The method should run in O(n) 
time and you should prove that each of the n! possible permutations of a is equally proba- 
ble. 

Exercise 7.3. 

Exercise 7 A. Use both parts of Lemma 7.2 to prove that the expected number of rotations 
performed by an add(x) operation (and hence also a remove(x) operation) is 0(1). 

Exercise 7.5. Modify the Treap implementation given here so that it does not explicitly 
store priorities. Instead, it should simulate them by hashing the hashCodeQ of each node. 

Exercise 7.6. Design and implement a Treap implementation in which each node keeps 
track of the minimum and maximum values in its subtree. Using this extra informa- 
tion, add a f ingerFind(x, u) method that executes the find(x) operation with the help 
of a pointer to the node u (which is hopefully not far from the node that contains x). 
This operation should start at u and walk upwards until it reaches a node w such that 
w.min < x < w.max. and a node that contains a value no less than x. At that point, it should 
perform a standard search for x starting from w. (One can show that f ingerFind(x, u) takes 
0(1 + logr) time, where r is the number of elements in the treap whose value is between x 
and u.x.) 
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Exercise 7 .7. Design and implement a version of a Treap that includes a get(i) operation 
that returns the key with rank i in the Treap. (Hint: Have each node, u, keep track of the 
size of the subtree rooted at u.) 

Exercise 7.8. Implement a TreapList, an implementation of the List interface as a treap. 
Each node in the treap should store a list item, and an in-order traversal of the treap finds 
the items in the same order that they occur in the list. All the List operations get(i), 
set(i, x), add(i, x) and remove(i) should run in O(logn) expected time. 

Exercise 7.9. Design and implement a version of a Treap that supports the split(x) opera- 
tion. This operation removes all values from the Treap that are greater than x and returns 
a second Treap that contains all the removed values. 

For example, the code t2 = t.split(x) removes from t all values greater than x and 
returns a new Treap t2 containing all these values. The split(x) operation should run in 
O(logn) expected time. 

Warning: For this modification to work properly and still allow the size() method 
to run in constant time, it is necessary to implement the modifications in Exercise 7.7. 

Exercise 7.10. Design and implement a version of a Treap that supports the absorb(t2) 
operation, which can be thought of as the inverse of the split(x) operation. This opera- 
tion removes all values from the Treap t2 and adds them to the receiver. This operation 
presupposes that the smallest value in 1 2 is greater than the largest value in the receiver. 
The absorb(t2) operation should run in O(logn) expected time. 

Exercise 7.11. Implement Martinez's randomized binary search trees, as discussed in this 
section. Compare the performance of your implementation with that of the Treap imple- 
mentation. 
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Chapter 8 



Scapegoat Trees 



In this chapter, we study a binary search tree data structure, the ScapegoatTree. This 
structure is based on the common wisdom that, when something goes wrong, the first 
thing we should do is find someone to blame it on (the scapegoat). Once blame is firmly 
established, we can leave the scapegoat to fix the problem. 

A ScapegoatTree that keeps itself balanced by partial rebuilding operations. Dur- 
ing a partial rebuilding operation, an entire subtree is deconstructed and rebuilt into a 
perfectly balanced subtree. There are many ways of rebuilding a subtree rooted at node u 
into a perfectly balanced tree. One of the simplest is to traverse u's subtree, gathering all 
its nodes into an array a and then to recursively build a balanced subtree using a. If we let 
m = a. length/2, then the element a[m] becomes the root of the new subtree, a[0], ...,a[m- 1] 
get stored recursively in the left subtree and a[m+ 1], ...,a[a. length - 1] get stored recur- 
sively in the right subtree. 

ScapegoatTree 



void rebuild (Node<T> u) { 
int ns = size(u) ; 
Node<T> p = u. parent; 

Node<T>[] a = (Node<T>[ ] ) Array . newInstance(Node. class, ns); 
packlntoArray (u, a, 0); 
if ( p == nil) { 

r = buildBalanced(a, 0, ns); 

r. parent = nil; 
} else if (p. right == u) { 

p. right = buildBalanced(a, 0, ns); 

p. right . parent = p; 
} else { 

p. left = buildBalanced(a, 0, ns); 

p. left .parent = p; 
} 
} 
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int packlntoArray (Node<T> u, Node<T>[] a, int i) { 




if ( u == nil) { 




return i; 

} 

i = packlntoArray (u . left , a, i); 






a[i++] = u; 




return packlntoArray (u. right , a, i); 

} 

Node<T> buildBalanced(Node<T>[ ] a, int i, int ns) { 






if (ns == 0) 




return nil; 




int m = ns / 2; 




a[i + mj.left = buildBalanced(a, i, m) ; 




if (a[i + m] .left != nil) 




a[i + m] .left .parent = a[i + m] ; 




a[i + m] . right = buildBalanced(a, i + m + 1, ns - m - 


- 1); 


if (a[i + m] . right != nil) 




a[i + m] .right . parent = a[i + m] ; 




return a[i + m] ; 
} 





A call to rebuild(u) takes 0(size(u)) time. The subtree built by rebuild(u) has minimum 
height; there is no tree of smaller height that has size(u) nodes. 

8.1 ScapegoatTree: A Binary Search Tree with Partial Rebuilding 

A ScapegoatTree is a BinarySearchTree that, in addition to keeping track of the number, 

n, of nodes in the tree also keeps a counter, q, that maintains an upper-bound on the 

number of nodes. 
ScapegoatTree 



int q; 



At all times, n and q obey the following inequalities: 

q/2 < n < q . 

In addition, a ScapegoatTree has logarithmic height; at all times, the height of the scape- 
goat tree does not exceed: 

log 3/2 q < log 3/2 2n < log 3/2 n + 2 . (8.1) 

Even with this constraint, a ScapegoatTree can look surprisingly unbalanced. The tree in 
Figure 8.1 has q = n = 10 and height 5 < log 3/2 10 « 5.679. 
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Figure 8.1: A ScapegoatTree with 10 nodes and height 5. 

Implementing the f ind(x) operation in a ScapegoatTree is done using the stan- 
dard algorithm for searching in a BinarySearchTree (see Section 6.2). This takes time 
proportional to the height of the tree which, by (8.1) is O(logn). 

To implement the add(x) operation, we first increment n and q and then use the 
usual algorithm for adding x to a binary search tree; we search for x and then add a new 
leaf u with u.x = x. At this point, we may get lucky and the depth of u might not exceed 
log 3/2 q- If so, then we leave well enough alone and don't do anything else. 

Unfortunately, it will sometimes happen that depth(u) > log 3/2 q- In this case we 
need to do something to reduce the height. This isn't a big job; there is only one node, 
namely u, whose depth exceeds log 3/2 q- To fix u, we walk from u back up to the root 
looking for a scapegoat, w. The scapegoat, w, is a very unbalanced node. It has the property 
that 

size(w.child) 2 
size(w) 3 

where w. child is the child of w on the path from the root to u. We'll very shortly prove that 
a scapegoat exists. For now, we can take it for granted. Once we've found the scapegoat 
w, we completely destroy the subtree rooted at w and rebuild it into a perfectly balanced 
binary search tree. We know, from (8.2), that, even before the addition of u, w's subtree was 
not a complete binary tree. Therefore, when we rebuild w, the height decreases by at least 
1 so that height of the ScapegoatTree is once again at most log 3/2 q. 
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Figure 8.2: Inserting 3.5 into a ScapegoatTree increases its depth to 6, which violates (8.1) 
since 6 > log 3/2 11 ~ 5.914. A scapegoat is found at the node containing 5. 







Scapegoat 


free 




b 


aolean add(T x) { 








// first do basic insertion keeping 


track 


of depth 




Node<T> u = newNode(x); 










int d = addWithDepth(u) ; 










if (d > log32(q)) { 










// depth exceeded, find 


scapegoat 








Node<T> w = u. parent; 










while (3*size(w) <= 2*size(w. parent ) ) 






w = w. parent; 










rebuild (w. parent) ; 

} 

return d >= 0; 








} 









If we ignore the cost of finding the scapegoat w and rebuilding the subtree rooted at 
w, then the running time of add(x) is dominated by the initial search, which takes 0(log q) = 
O(logn) time. We will account for the cost of finding the scapegoat and rebuilding using 
amortized analysis in the next section. 

The implementation of remove(x) in a ScapegoatTree is very simple. We search for 
x and remove it using the usual algorithm for removing a node from a BinarySearchTree. 
(Note that this can never increase the height of the tree.) Next, we decrement n but leave q 
unchanged. Finally, we check if q > 2n and, if so, we rebuild the entire tree into a perfectly 
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balanced binary search tree and set q = n. 

ScapegoatTree 



boolean remove(T x) { 
if (super. remove( x) ) { 
if (2*n < q) { 
rebuild(r) ; 

q = n; 

} 

return true; 

} 

return false; 

} 



Again, if we ignore the cost of rebuilding, the running time of the remove(x) operation is 
proportional to the height of the tree, and is therefore O(logn). 

8.1.1 Analysis of Correctness and Running-Time 

In this section we analyze the correctness and amortized running time of operations on a 
ScapegoatTree. We first prove the correctness by showing that, when the add(x) operation 
results in a node that violates Condition (8.1), then we can always find a scapegoat: 

Lemma 8.1. Let ubea node of depth h > log 3/2 q in a ScapegoatTree. Then there exists a node 
w on the path from u to the root such that 

SiZ6(w) >2/3. 



size(parent(w)) 

Proof. Suppose, for the sake of contradiction, that this is not the case, and 

size(w) 



size(parent(w)) 



<2/3 



for all nodes w on the path from u to the root. Denote the path from the root to u as 
r = uq,...,u/, = u. Then, we have size(u ) = n, size(uj) < |n, size(u 2 ) < gn and, more 
generally, 

size(u,-)<(-J n . 

But this gives a contradiction, since size(u) > 1, hence 

, , (2\ h /2\ l0 S3/2q /2\ lo g3/2 n n\ 
l<size(u)<(-)n<(-) n<(-) n = - n = 1 . D 
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Next, we analyze the parts of the running time that we have not yet accounted for. 
There are two parts: The cost of calls to size(u) when search for scapegoat nodes, and the 
cost of calls to rebuild(w) when we find a scapegoat w. The cost of calls to size(u) can be 
related to the cost of calls to rebuild(w), as follows: 

Lemma 8.2. During a call to add(x) in a ScapegoatTree, the cost of finding the scapegoat w 
and rebuilding the subtree rooted at w is 0(size(w)). 

Proof. The cost of rebuilding the scapegoat node w, once we find it, is 0(size(w)). When 
searching for the scapegoat node, we call size(u) on a sequence of nodes u , ...,ujt until 
we find the scapegoat u^ = w. However, since u^ is the first node in this sequence that is a 

scapegoat, we know that 

2 
size(Uj)< -size(u, +1 ) 

for all i 6 {0, ...,k-2}. Therefore, the cost of all calls to size(u) is 
t k \ I fc-1 



O 



^size(u jfc _ i ) 



= O 



i=0 



= o 



o 



size(u )t )+^size(u it _ ! _ 1 

fc-l r. j 

size(uj.) + ^(-J sizefu^) 
i=0 

/ t k-\ j\\ 

size(u^) 1 + ^(3) 



= 0(size(u k )) = 0(size(w)) , 

where the last line follows from the fact that the sum is a geometrically decreasing series. 

□ 

All that remains is to prove an upper-bound on the cost of all calls to rebuild(u) 
during a sequence of m operations: 

Lemma 8.3. Starting with an empty ScapegoatTree any sequence of m add(x) and remove(x) 
operations causes at most O(mlogm) time to be used by rebuild(u) operations. 

Proof. To prove this, we will use a credit scheme. We imagine that each node stores a 
number of credits. Each credit can pay for some constant, c, units of time spent rebuilding. 
The scheme gives out a total of 0(m log m) credits and every call to rebuild(u) is paid for 
with credits stored at u. 
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During an insertion or deletion, we give one credit to each node on the path to 
the inserted node, or deleted node, u. In this way we hand out at most log 3/2 q < log 3/2 m 
credits per operation. During a deletion we also store an additional 1 credit "on the side." 
Thus, in total we give out at most 0(m log m) credits. All that remains is to show that these 
credits are sufficient to pay for all calls to rebuild(u). 

If we call rebuild(u) during an insertion, it is because u is a scapegoat. Suppose, 

without loss of generality, that 

size(u.left) 2 

> - . 

size(u) 3 

Using the fact that 

size(u) = l + size(u.lef t) + size(u. right) 

we deduce that 

— size(u.lef t) > size(u. right) 

and therefore 

size(u.left)- size(u. right) > -size(u.left) > -size(u) . 

Now, the last time a subtree containing u was rebuilt (or when u was inserted, if a subtree 
containing u was never rebuilt), we had 

size(u. left)-size(u. right) < 1 . 

Therefore, the number of add(x) or remove(x) operations that have affected u.left or 
u. right since then is at least 

— size(u)-l . 

3 

and there are therefore at least this many credits stored at u that are available to pay for 
the 0(size(u)) time it takes to call rebuild(u). 

If we call rebuild(u) during a deletion, it is because q > 2n. In this case, we have 
q - n > n credits stored "on the side" and we use these to pay for the O(n) time it takes to 
rebuild the root. This completes the proof. □ 

8.1.2 Summary 

The following theorem summarizes the performance of the ScapegoatTree data structure: 

Theorem 8.1. A ScapegoatTree implements the SSet interface. Ignoring the cost of rebuild(u) 
operations, a ScapegoatTree supports the operations add(x), remove(x), and f ind(x) in O(logn) 
time per operation. 
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Furthermore, beginning with an empty ScapegoatTree, any sequence of m add(x) and 
remove(x) operations results in a total ofO(mlogm) time spent during all calls to rebuild(u). 

8.2 Discussion and Exercises 

The term scapegoat tree is due to Galperin and Rivest [31], who define and analyze these 
trees. However, the same structure was discovered earlier by Andersson [4, 6], who called 
them general balanced trees since they can have any shape as long as their height is small. 

Experimenting with the ScapegoatTree implementation will reveal that it is of- 
ten considerably slower than the other SSet implementations in this book. This may be 
somewhat surprising, since height bound of 

log 3/2 q«1.7091ogn + O(l) 

is better than the expected length of a search path in a Skiplist and not too far from that 
of a Treap. The implementation could be optimized by storing the sizes of subtrees ex- 
plicitly at each node or by reusing already computed subtree sizes (Exercises 8.3 and 8.4). 
Even with these optimizations, there will always be sequences of add(x) and delete(x) 
operation for which a ScapegoatTree takes longer than other SSet implementations. 

This gap in performance is due to the fact that, unlike the other SSet implementa- 
tions discussed in this book, a ScapegoatTree can spend a lot of time restructuring itself. 
Exercise 8.1 asks you to prove that there are sequences of n operations in which a Scape- 
goatTree will spend on the order of n log n time in calls to rebuild(u). This is in contrast to 
other SSet implementations discussed in this book that only make O(n) structural changes 
during a sequence of n operations. This is, unfortunately, a necessary consequence of the 
fact that a ScapegoatTree does all its restructuring by calls to rebuild(u) [18]. 

Despite their lack of performance, there are applications in which a Scapegoat- 
Tree could be the right choice. This would occur any time there is additional data asso- 
ciated with nodes that cannot be updated in constant time when a rotation is performed, 
but that can be updated during a rebuild(u) operation. In such cases, the Scapegoat- 
Tree and related structures based on partial rebuilding may work. An example of such an 
application is outlined in Exercise 8.10. 

Exercise 8.1. Show that, if we start with an empty ScapegoatTree and call add(x) for x = 
1,2, 3,..., n, then the total time spent during calls to rebuild(u) is at least en log n for some 
constant c > 0. 

Exercise 8.2. The ScapegoatTree, as described in this chapter guarantees that the length of 
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the search path does not exceed log 3/2 q- Design, analyze, and implement a modified ver- 
sion of ScapegoatTree where the length of the search path does not exceed log b q, where 
b is a parameter with 1 < b < 2. 

What does your analysis and/or experiments say about the amortized cost of f ind(x), 
add(x) and remove(x) as a function of b? 

Exercise 8.3. Modify the add(x) method of the ScapegoatTree so that it does not waste any 
time recomputing the sizes of subtrees that it has already computed the size of. This is pos- 
sible because, by the time the method wants to compute size(w), it has already computed 
size(w.lef t) or size(w.right). 

Compare the performance of your modified implementation with the implementa- 
tion given here. 

Exercise 8.4. Implement a second version of the ScapegoatTree data structure that ex- 
plicitly stores and maintains the sizes of the subtree rooted at each node. Compare the 
performance of the resulting implementation with that of the original ScapegoatTree im- 
plementation as well as the implementation from Exercise 8.3. 

Exercise 8.5. Reimplement the rebuild(u) method discussed at the beginning of this chap- 
ter so that it does not require the use of an array to store the nodes of the subtree being 
rebuilt. Instead, it should use recursion to first connect the nodes into a linked list and the 
convert this linked list into a perfectly balanced binary tree. 

Exercise 8.6. Analyze and implement a WeightBalancedTree. This is a tree in which each 
node u, except the root, maintains the balance invariant that size(u) < (2/3)size(u. parent). 
The add(x) and remove(x) operations are identical to the standard BinarySearchTree op- 
erations, except that any time the balance invariant is violated at a node u, the subtree 
rooted at u. parent is rebuilt. Your analysis should show that operations on a WeightBal- 
ancedTree run in O(logn) amortized time. 

Exercise 8.7. Analyze and implement a WeightBalancedTree2. This is a tree in which each 
node u, except the root, maintains the balance invariant that size(u) < (2/3)size(u. parent). 
The add(x) and remove(x) operations are identical to the standard BinarySearchTree op- 
erations, except that any time the balance invariant is violated at a node u, rotations are 
performed in order to restore these invariants. 

Your analysis should show that operations on a WeightBalancedTree2 run in 0(log n) 
worst-case time. 

Exercise 8.8. Analyze and implement a CountdownTree. In a CountdownTree each node u 
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keeps a timer u.t. The add(x) and remove(x) operations are exactly the same as in a stan- 
dard BinarySearchTree except that, whenever one of these operations affects u's subtree, 
u.t is decremented. When u.t = the entire subtree rooted at u is rebuilt into a perfectly 
balanced binary search tree. When a node u is involved in a rebuilding operation (either 
because u is rebuilt or one of u's ancestors is rebuilt) u.t is reset to size(u)/3. 

Your analysis should show that operations on a CountdownTree run in O(logn) 
amortized time. (Hint: First show that each node u satisfies some version of a balance 
invariant.) 

Exercise 8.9. Analyze and implement a DynamiteTree. In a DynamiteTree each node u 
keeps tracks of the size of the subtree rooted at u in a variable u.size. The add(x) and 
remove(x) operations are exactly the same as in a standard BinarySearchTree except that, 
whenever one of these operations affects a node u's subtree, u explodes with probability 
1/u.size. When u explodes, its entire subtree is rebuilt into a perfectly balanced binary 
search tree. 

Your analysis should show that operations on a DynamiteTree run in O(logn) ex- 
pected time. 

Exercise 8.10. Design and implement a Sequence data structure that maintains a sequence 
(list) of elements. It supports these operations: 

• addAf ter(e): Add a new element after the element e in the sequence. Return the 
newly added element. (If e is null, the new element is added at the beginning of the 
sequence.) 

• remove(e): Remove e from the sequence. 

• testBefore(e1,e2): return true if and only if e1 comes before e2 in the sequence. 

The first two operations should run in O(logn) amortized time. The third operation should 
run in constant-time. 

The Sequence data structure can be implemented by storing the elements in some- 
thing like a ScapegoatTree, in the same order that they occur in the sequence. To imple- 
ment testBefore(e1,e2) in constant time, each element e is labelled with an integer that 
encodes the path from the root to e. In this way, testBefore(e1,e2) can be implemented 
just by comparing the labels of e 1 and e2. 
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Chapter 9 
Red-Black Trees 



In this chapter, we present red-black trees, a version of binary search trees that have loga- 
rithmic depth. Red-black trees are one of the most widely-used data structures in practice. 
They appear as the primary search structure in many library implementations, including 
the Java Collections Framework and several implementations of the C++ Standard Tem- 
plate Library. They are also used within the Linux operating system kernel. There are 
several reasons for the popularity of red-black trees: 

1. A red-black tree storing n values has height at most 21ogn. 

2. The add(x) and remove(x) operations on a red-black tree run in O(logn) worst-case 
time. 

3. The amortized number of rotations done during an add(x) or remove(x) operation is 
constant. 

The first two of these properties already put red-black trees ahead of skiplists, treaps, and 
scapegoat trees. Skiplists and treaps rely on randomization and their O(logn) running 
times are only expected. Scapegoat trees have a guaranteed bound on their height, but 
add(x) and remove(x) only run in O(logn) amortized time. The third property is just icing 
on the cake. It tells us that that the time needed to add or remove an element x is dwarfed 
by the time it takes to find x. 1 

However, the nice properties of red-black trees come with a price: implementation 
complexity. Maintaining a bound of 21ogn on the height is not easy. It requires a careful 
analysis of a number of cases and it requires that the implementation does exactly the right 



Note that skiplists and treaps also have this property in the expected sense. See Exercise 4.1 and Exer- 
cise 7.4. 
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9.1. 2-4 Trees 




Figure 9.1: A 2-4 tree of height 3. 

thing in each case. One misplaced rotation or change of color produces a bug that can be 
very difficult to understand and track down. 

Rather than jumping directly into the implementation of red-black trees, we will 
first provide some background on a related data structure: 2-4 trees. This will give some 
insight into how red-black trees were discovered and why efficiently maintaining red- 
black trees is even possible. 

9.1 2-4 Trees 

A 2-4 tree is a rooted tree with the following properties: 

Property 9.1 (height). All leaves have the same depth. 

Property 9.2 (degree). Every internal node has 2, 3, or 4 children. 

An example of a 2-4 tree is shown in Figure 9.1. The properties of 2-4 trees imply 
that their height is logarithmic in the number of leaves: 

Lemma 9.1. A 2-4 tree with n leaves has height at most logn. 

Proof. The lower-bound of 2 on the degree of an internal node implies that, if the height 
of a 2-4 tree is h, then it has at least 2 h leaves. In other words, 

n > 2 h . 

Taking logarithms on both sides of this equation gives h < logn. □ 

9.1.1 Adding a Leaf 

Adding a leaf to a 2-4 tree is easy (see Figure 9.2). If we want to add a leaf u as the child 
of some node w on the second-last level, we simply make u a child of w. This certainly 
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Figure 9.2: Adding a leaf to a 2-4 Tree. This process stops after one split because w.parent 
has degree less than 4 before the addition. 

maintains the height property, but could violate the degree property; if w had 4 children 
prior to adding u, then w now has 5 children. In this case, we split w into two nodes, w 
and w', having 2 and 3 children, respectively. But now w' has no parent, so we recursively 
make w' a child of w's parent. Again, this may cause w's parent to have too many children 
in which case we split it. This process goes on until we reach a node that has fewer than 4 
children, or until we split the root, r, into two nodes r and r'. In the latter case, we make 
a new root that has r and c' as children. This simultaneously increases the depth of all 
leaves and so maintains the height property. 

Since the height of the 2-4 tree is never more than logn, the process of adding a 
leaf finishes after at most logn steps. 
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9.1.2 Removing a Leaf 

Removing a leaf from a 2-4 tree is a little more tricky (see Figure 9.3). To remove a leaf u 
from its parent w, we just remove it. If w had only two children prior to the removal of u, 
then w is left with only one child and violates the degree property. 

To correct this, we look at w's sibling, w'. The node w' is sure to exist since w's parent 
has at least 2 children. If w' has 3 or 4 children, then we take one of these children from w' 
and give it to w. Now w has 2 children and w' has 2 or 3 children and we are done. 

On the other hand, if w' has only two children, then we merge w and w' into a single 
node, w, that has 3 children. Next we recursively remove w' from the parent of w'. This 
process ends when we reach a node, u, where u or its sibling has more than 2 children; or 
we reach the root. In the latter case, if the root is left with only 1 child, then we delete the 
root and make its child the new root. Again, this simultaneously decreases the height of 
every leaf and therefore maintains the height property. 

Again, since the height of the tree is never more than log n, the process of removing 
a leaf finishes after at most logn steps. 

9.2 RedBlackTree: A Simulated 2-4 Tree 

A red-black tree is a binary search tree in which each node, u, has a color which is either 
red or black. Red is represented by the value and black by the value 1. 
RedBlackTree 



class Node<T> extends BinarySearchTree.BSTNode<Node<T>,T> { 

byte color; 
} 



Before and after any operation on a red-black tree, the following two properties are 
satisfied. Each property is defined both in terms of the colors red and black, and in terms 
of the numeric values and 1 . 

Property 9.3 (black-height). There are the same number of black nodes on every root to 
leaf path. (The sum of the colors on any root to leaf path is the same.) 

Property 9.4 (no-red-edge). No two red nodes are adjacent. (For any node u, except the 
root, u. color + u. parent. color > 1.) 

Notice that we can always color the root, r, of a red-black tree black without vi- 
olating either of these two properties, so we will assume that the root is black, and the 
algorithms for updating a red-black tree will maintain this. Another trick that simplifies 
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Figure 9.3: Removing a leaf from a 2-4 Tree. This process goes all the way to the root 
because all of u's ancestors and their siblings have degree 2. 
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Figure 9.4: An example of a red-black tree with black-height 3. External (nil) nodes are 
drawn as squares. 

red-black trees is to treat the external nodes (represented by nil) as black nodes. This way, 
every real node, u, of a red-black tree has exactly two children, each with a well-defined 
color. An example of a red-black tree is shown in Figure 9.4. 

9.2.1 Red-Black Trees and 2-4 Trees 

At first it might seem surprising that a red-black tree can be efficiently updated to maintain 
the black-height and no-red-edge properties, and it seems unusual to even consider these 
as useful properties. However, red-black trees were designed to be an efficient simulation 
of 2-4 trees as binary trees. 

Refer to Figure 9.5. Consider any red-black tree, T , having n nodes and perform the 
following transformation: Remove each red node u and connect u's two children directly 
to the (black) parent of u. After this transformation we are left with a tree T having only 
black nodes. 

Every internal node in T' has 2, 3, or 4 children: A black node that started out with 
two black children will still have two black children after this transformation. A black 
node that started out with one red and one black child will have three children after this 
transformation. A black node that started out with two red children will have 4 children 
after this transformation. Furthermore, the black-height property now guarantees that 
every root-to-leaf path in T has the same length. In other words, T is a 2-4 tree! 

The 2-4 tree T' has n + 1 leaves that correspond to the n + 1 external nodes of the 
red-black tree. Therefore, this tree has height log(n + 1 ). Now, every root to leaf path in the 
2-4 tree corresponds to a path from the root of the red-black tree T to an external node. 
The first and last node in this path are black and at most one out of every two internal 
nodes is red, so this path has at most log(n + 1) black nodes and at most log(n + 1) - 1 red 
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Figure 9.5: Every red-black tree has a corresponding 2-4 tree. 

nodes. Therefore, the longest path from the root to any internal node in T is at most 

21og(n + l)-2< 21ogn , 
for any n > 1. This proves the most important property of red-black trees: 
Lemma 9.2. The height of red-black tree with n nodes is at most 21ogn. 

Now that we have seen the relationship between 2-4 trees and red-black trees, it 
is not hard to believe that we can efficiently maintain a red-black tree while adding and 
removing elements. 

We have already seen that adding an element in a BinarySearchTree can be done 
by adding a new leaf. Therefore, to implement add(x) in a red-black tree we need a method 
of simulating splitting a degree 5 node in a 2-4 tree. A degree 5 node is represented by a 
black node that has two red children one of which also has a red child. We can "split" this 
node by coloring it red and coloring its two children black. An example of this is shown 
in Figure 9.6. 

Similarly implementing remove(x) requires a method of merging two nodes and 
borrowing a child from a sibling. Merging two nodes is the inverse of a split (shown in 
Figure 9.6), and involves coloring two (black) siblings red and coloring their (red) parent 
black. Borrowing from a sibling is the most complicated of the procedures and involves 
both rotations and recoloring of nodes. 
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Figure 9.6: Simulating a 2-4 tree split operation during an addition in a red-black tree. 
(This simulates the 2-4 tree addition shown in Figure 9.2.) 
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Of course, during all of this we must still maintain the no-red-edge property and 
the black-height property. While it is no longer surprising that this can be done, there are 
a large number of cases that have to be considered if we try to do a direct simulation of 
a 2-4 tree by a red-black tree. At some point, it just becomes simpler to forget about the 
underlying 2-4 tree and work directly towards maintaining the red-black tree properties. 

9.2.2 Left-Leaning Red-Black Trees 

There is no single definition of a red-black tree. Rather, there are a family of structures 
that manage to maintain the black-height and no-red-edge properties during add(x) and 
remove(x) operations. Different structures go about it in different ways. Here, we imple- 
ment a data structure that we call a RedBlackTree. This structure implements a particular 
variant of red-black trees that satisfies an additional property: 

Property 9.5 (left-leaning). At any node u, if u.lef t is black, then u. right is black. 

Note that the red-black tree shown in Figure 9.4 does not satisfy the left-leaning 
property; it is violated by the parent of the red node in the rightmost path. 

The reason for maintaining the left-leaning property is that it reduces the number 
of cases encountered when updating the tree during add(x) and remove(x) operations. In 
terms of 2-4 trees, it implies that every 2-4 tree has a unique representation: A node of 
degree 2 becomes a black node with 2 black children. A node of degree 3 becomes a black 
node whose left child is red and whose right child is black. A node of degree 4 becomes a 
black node with two red children. 

Before we describe the implementation of add(x) and remove(x) in detail, we first 
present some simple subroutines used by these methods that are illustrated in Figure 9.7. 
The first two subroutines are for manipulating colors while preserving the black-height 
property. The pushBlack(u) method takes as input a black node u that has two red chil- 
dren and colors u red and its two children black. The pullBlack(u) method reverses this 
operation: 
RedBlackTree 



void pushBlack(Node<T> u) { 
u . color-- ; 
u . left .color++; 
u. right .color++; 

} 

void pullBlack(Node<T> u) { 

u.color++; 

u.lef t .color-- ; 
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A u A u A u A u 

</\ /\ S\ <A> 

pushBlack(u) pullBlack(u) flipLeft(u) flipRight(u) 

H H H H 



/K <A> yA 




Figure 9.7: Flips, pulls and pushes 



u .right .color-- ; 
} 



The flipLef t(u) method swaps the colors of u and u. right and then performs a 

left rotation at u. This reverses the colors of these two nodes as well as their parent-child 

relationship: 
RedBlackTree 



void flipl_eft(Node<T> u) { 
swapColors(u, u. right); 
rotateLef t (u) ; 

} 



The f lipLef t(u) operation is especially useful in restoring the left-leaning property at a 

node u that violates it (because u.lef t is black and u. right is red). In this special case, we 

can be assured this operation preserves both the black-height and no-red-edge properties. 

The f lipRight(u) operation is symmetric to f lipLef t(u) with the roles of left and right 

reversed. 
RedBlackTree 



void flipRight(Node<T> u) { 
swapColors(u, u.left); 
rotateRight (u) ; 

} 



9.2.3 Addition 

To implement add(x) in a RedBlackTree, we perform a standard BinarySearchTree inser- 
tion, which adds a new leaf, u, with u.x = x and set u. color = red. Note that this does not 
change the black height of any node, so it does not violate the black-height property. It 
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may, however, violate the left-leaning property (if u is the right child of its parent) and it 

may violate the no-red-edge property (if u's parent is red). To restore these properties, we 

call the method addFixup(u). 
RedBlackTree 



boolean add(T x) { 

Node<T> u = newNode(x); 

u. color = red; 

boolean added = add(u); 

if (added) 
addFixup(u) ; 

return added; 
} 



The addFixup(u) method, illustrated in Figure 9.8, takes as input a node u whose 
color is red and which may be violating the no-red-edge property and/or the left-leaning 
property. The following discussion is probably impossible to follow without referring to 
Figure 9.8 or recreating it on a piece of paper. Indeed, the reader may wish to study this 
figure before continuing. 

If u is the root of the tree, then we can color u black and this restores both proper- 
ties. If u's sibling is also red, then u's parent must be black, so both the left-leaning and 
no-red-edge properties already hold. 

Otherwise, we first determine if u's parent, w, violates the left-leaning property 
and, if so, perform a f lipLef t(w) operation and set u = w. This leaves us in a well-defined 
state: u is the left child of its parent, w, so w now satisfies the left-leaning property. All that 
remains is to ensure the no-red-edge property at u. We only have to worry about the case 
where w is red, since otherwise u already satisfies the no-red-edge property. 

Since we are not done yet, u is red and w is red. The no-red-edge property (which 
is only violated by u and not by w) implies that u's grandparent g exists and is black. If g's 
right child is red, then the left-leaning property ensures that both g's children are red, and 
a call to pushBlack(g) makes g red and w black. This restores the no-red-edge property at 
u, but may cause it to be violated at g, so the whole process starts over with u = g. 

If g's right child is black, then a call to flipRight(g) makes w the (black) parent 

of g and gives w two red children, u and g. This ensures that u satisfies the no-red-edge 

property and g satisfies the left-leaning property. In this case we can stop. 
RedBlackTree 



void addFixup(Node<T> u) { 
while (u. color == red) { 

if (u == r) { // u is the root - done 
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flipRight(g 





pushBlack(g) 




pushBlack(g) 




Figure 9.8: A single round in the process of fixing Property 2 after an insertion. 
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u. color = black; 
return; 

} 

Node<T> w = u. parent; 

if (w. left .color == black) { // ensure left-leaning 

flipLeft(w) ; 

u = w; 

w = u. parent; 

} 

if (w. color == black) 

return; //no red-red edge = done 
Node<T> g = w. parent; // grandparent of u 
if (g. right . color == black) { 

flipRight(g); 

return; 
} else { 

pushBlack(g) ; 

u = g; 
} 



} 



} 



The insertFixup(u) method takes constant time per iteration and each iteration 
either finishes or moves u closer to the root. This implies that the insertFixup(u) method 
finishes after O(logn) iterations in O(logn) time. 

9.2.4 Removal 

The remove(x) operation in a RedBlackTree tree is the most complicated operation to im- 
plement, and this is true of all known implementations. Like remove(x) in a Binary- 
SearchTree, this operation boils down to finding a node w with only one child, u, and 
splicing w out of the tree by having w.parent adopt u. 

The problem with this is that, if w is black, then the black-height property will now 

be violated at w.parent. We get around this problem, temporarily, by adding w.color to 

u. color. Of course, this introduces two other problems: (1) u and w both started out black, 

then u. color + w.color = 2 (double black), which is an invalid color. If w was red, then it is 

replaced by a black node u, which may violate the left-leaning property at u. parent. Both 

of these problems are resolved with a call to the removeFixup(u) method. 
RedBlackTree 



boolean remove(T x) { 

Node<T> u = findLast(x); 

if (u == nil | | compare(u . x, x) != 0) 
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return false; 
Node<T> w = u. right; 
if (w == nil) { 

w = u; 

u = w.left; 
} else { 

while (w.left != nil) 
w = w.left; 

u . x = w. x; 

u = w. right; 

} 

splice(w) ; 

u. color += w. color; 

u. parent = w. parent; 

removeFixup(u) ; 

return true; 



} 



The removeFixup(u) method takes as input a node u whose color is black (1) or 

double-black (2). If u is double-black, then removeFixup(u) performs a series of rotations 

and recoloring operations that move the double-black node up the tree until it can be 

gotten rid of. During this process, the node u changes until, at the end of this process, u 

refers to the root of the subtree that has been changed. The root of this subtree may have 

changed color. In particular, it may have gone from red to black, so the removeFixup(u) 

method finishes by checking if u's parent violates the left-leaning property and, if so, fixes 

it. 
RedBlackTree 



void removeFixup(Node<T> u) { 
while (u. color > black) { 
if (u == r) { 

u. color = black; 
} else if (u. parent . left . color == red) { 

u = removeFixupCasel (u) ; 
} else if (u == u . parent .left ) { 

u = removeFixupCase2(u) ; 
} else { 

u = removeFixupCase3(u) ; 
} 
} 

if (u != r) { // restore left-leaning property, if necessary 
Node<T> w = u. parent; 

if (w. right . color == red && w.left. color == black) { 
flipLeft(w); 
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} 
} 
} 



The removeFixup(u) method is illustrated in Figure 9.9. Again, the following text 
will be very difficult, if not impossible, to follow without referring constantly to Figure 9.9. 
Each iteration of the loop in removeFixup(u) processes the double-black node u based on 
one of four cases. 

Case 0: u is the root. This is the easiest case to treat. We recolor u to be black and this does 
not violate any of the red-black tree properties. 

Case 1: u's sibling, v, is red. In this case, u's sibling is the left child of its parent, w (by the 
left-leaning property). We perform a right-flip at w and then proceed to the next iteration. 
Note that this causes w's parent to violate the left-leaning property and it causes the depth 
of u to increase. However, it also implies that the next iteration will be in Case 3 with w 
colored red. When examining Case 3, below, we will see that this means the process will 
stop during the next iteration. 
RedBlackTree 



Node<T> removeFixupCasel (Node<T> u) { 

flipRight(u. parent) ; 

return u; 
} 



Case 2: u's sibling, v, is black and u is the left child of its parent, w. In this case, we call 
pullBlack(w), making u black, v red, and darkening the color of w to black or double- 
black. At this point, w does not satisfy the left-leaning property so we call f lipLef t(w) to 
fix this. 

At this point, w is red and v is the root of the subtree we started with. We need to 
check if w causes no-red-edge property to be violated. We do this by inspecting w's right 
child, q. If q is black, then w satisfies the no-red-edge property and we can continue to the 
next iteration with u = v. 

Otherwise (q is red), both the no-red-edge property and the left-leaning property 
are violated at q and w, respectively. A call to rotateLef t(w) restores the left-leaning 
property, but the no-red-edge property is still violated. At this point, q is the left child 
of v and w is the left child of q, q and w are both red and v is black or double-black. A 
flipRight(v) makes q the parent of both v and w. Following this up by a pushBlack(q) 
makes both v and w black and sets the color of q back to the original color of w. 
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Figure 9.9: A single round in the process of eliminating a double-black node after a re- 
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At this point, there is no more double-black node and the no-red-edge and black- 
height properties are reestablished. The only possible problem that remains is that the 
right child of v may be red, in which case the left-leaning property is violated. We check 

this and perform a flipLef t(v) to correct it if necessary. 
RedBlackTree 



Node<T> removeFixupCase2(Node<T> u) { 
Node<T> w = u. parent; 
Node<T> v = w. right; 
pullBlack(w) ; // w.left 
flipLef t (w) ; // w is now red 
Node<T> q = w. right; 
if (q. color == red) { // q-w is red-red 

rotateLef t (w) ; 

flipRight(v); 

pushBlack(q) ; 

if (v. right . color == red) 
flipLeft(v); 

return q; 
} else { 

return v; 
} 
} 



Case 3: u's sibling is black and u is the right child of its parent, w. This case is symmetric to 
Case 2 and is handled mostly the same way. The only differences come from the fact that 
the left-leaning property is asymmetric, so it requires different handling. 

As before, we begin with a call to pullBlack(w), which makes v red and u black. A 
call to f lipRight(w) promotes v to the root of the subtree. At this point w is red, and the 
code branches two ways depending on the color of w's left child, q. 

If q is red, then the code finishes up exactly the same way that Case 2 finishes up, 
but is even simpler since there is no danger of v not satisfying the left-leaning property. 

The more complicated case occurs when q is black. In this case, we examine the 
color of v's left child. If it is red, then v has two red children and its extra black can be 
pushed down with a call to pushBlack(v). At this point, v now has w's original color and 
we are done. 

If v's left child is black then v violates the left-leaning property and we restore this 

with a call to f lipLef t(v). The next iteration of removeFixup(u) then continues with u = v. 
RedBlackTree 



Node<T> removeFixupCase3(Node<T> u) { 
Node<T> w = u. parent; 
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} 



II w is now red 
// q-w is red-red 



Node<T> v = w.left; 
pullBlack(w) ; 
flipRight(w) ; 
Node<T> q = w.left; 
if (q. color == red) { 

rotateRight(w) ; 

flipLeft(v); 

pushBlack(q) ; 

return q; 
} else { 

if (v. left .color == red) { 

pushBlack( v) ; // both v's children are red 



return v; 
} else { 

flipLeft(v); 

return w; 
} 



// ensure left-leaning 



Each iteration of removeFixup(u) takes constant time. Cases 2 and 3 either finish 
or move u closer to the root of the tree. Case (where u is the root) always terminates and 
Case 1 leads immediately to Case 3, which also terminates. Since the height of the tree is 
at most 21ogn, we conclude that there are at most O(logn) iterations of removeFixup(u) so 
removeFixup(u) runs in O(logn) time. 

9.3 Summary 

The following theorem summarizes the performance of the RedBlackTree data structure: 

Theorem 9.1. A RedBlackTree implements the SSet interface. A RedBlackTree supports the 
operations add(x), remove(x), and find(x) in O(logn) worst-case time per operation. 

Not included in the above theorem is the extra bonus 

Theorem 9.2. Beginning with an empty RedBlackTree, any sequence of m add(x) and remove(x) 
operations results in a total of O(m) time spent during all calls addFixup(u) and removeFixup(u). 

We will only sketch a proof of Theorem 9.2. By comparing addFixup(u) and removeFixup(u) 
with the algorithms for adding or removing a leaf in a 2-4 tree, we can convince ourselves 
that this property is something that is inherited from a 2-4 tree. In particular, if we can 
show that the total time spent splitting, merging, and borrowing in a 2-4 tree is O(m), then 
this implies Theorem 9.2. 
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The proof of this for 2-4 trees uses the potential method of amortized analysis. 2 
Define the potential of an internal node u in a 2-4 tree as 



<D(u) = 



1 if u has 2 children 
if u has 3 children 
3 if u has 4 children 



and the potential of a 2-4 tree as the sum of the potentials of its nodes. When a split occurs, 
it is because a node of degree 4 becomes two nodes, one of degree 2 and one of degree 3. 
This means that the overall potential drops by 3 - 1 - = 2. When a merge occurs, two 
nodes that used to have degree 2 are replaced by one node of degree 3. The result is a drop 
in potential of 2 - = 2. Therefore, for every split or merge, the potential decreases by 2. 

Next notice that, if we ignore splitting and merging of nodes, there are only a con- 
stant number of nodes whose number of children is changed by the addition or removal of 
a leaf. When adding a node, one node has its number of children increase by 1, increasing 
the potential by at most 3. During the removal of a leaf, one node has its number of chil- 
dren decrease by 1, increasing the potential by at most 1, and two nodes may be involved 
in a borrowing operation, increasing their total potential by at most 1. 

To summarize, each merge and split causes the potential to drop by at least 2. 
Ignoring merging and splitting, each addition or removal causes the potential to rise by 
at most 3, and the potential is always non-negative. Therefore, the number of splits and 
merges caused by m additions or removals on an initially empty tree is at most 3m/2. 
Theorem 9.2 is a consequence of this analysis and the correspondence between 2-4 trees 
and red-black trees. 

9.4 Discussion and Exercises 

Red-black trees were first introduced by Guibas and Sedgewick [36]. Despite their high 
implementation complexity they are found in some of the most commonly used libraries 
and applications. Most algorithms and data structures textbooks discuss some variant of 
red-black trees. 

Andersson [5] describes a left-leaning version of balanced trees that are similar to 
red-black trees but have the additional constraint that any node has at most one red child. 
This implies that these trees simulate 2-3 trees rather than 2-4 trees. They are significantly 
simpler, though, than the RedBlackTree structure presented in this chapter. 



"See the proofs of Lemma 2.2 and Lemma 3.1 for other applications of the potential method. 
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Sedgewick [64] describes at least two versions of left-leaning red-black trees. These 
use recursion along with a simulation of top-down splitting and merging in 2-4 trees. The 
combination of these two techniques makes for particularly short and elegant code. 

A related, and older, data structure is the AVL tree [2]. AVL trees are height- 
balanced: At each node u, the height of the subtree rooted at u.lef t and the subtree rooted 
at u. right differ by at most one. It follows immediately that, if F(h) is the minimum num- 
ber of leaves in a tree of height h, then F(h) obeys the Fibonacci recurrence 

F(h) = F(h-l) + F(h-2) 

with base cases F(0) = 1 and F(l) = 1. This means F(h) is approximately (p h /y5, where 
<p = (1 + V5)/2 « 1.61803399 is the golden ratio. (More precisely \<p h /yf5 - F(h)\ < 1/2.) 
Arguing as in the proof of Lemma 9.1, this implies 

/i < log n« 1.440420088 log n , 

so AVL trees have smaller height than red-black trees. The height-balanced property can 
be maintained during add(x) and remove(x) operations by walking back up the path to the 
root and performing a rebalancing operation at each node u where the height of u's left 
and right subtrees differ by 2. See Figure 9.10. 

Andersson's variant of red-black trees, Sedgewick's variant of red-black trees, and 
AVL trees are all simpler to implement than the RedBlackTree structure defined here. 
Unfortunately, none of them can guarantee that the amortized time spent rebalancing is 
0(1) per update. In particular, there is no analogue of Theorem 9.2 for those structures. 

Exercise 9.1. Why does the method remove(x) in the RedBlackTree implementation per- 
form the assignment u. parent =w.parent? Shouldn't this already be done by the call to 
splice(w)? 

Exercise 9.2. Suppose a 2-4 tree, T, has n^ leaves and n, internal nodes. 

1. What is the minimum value of n,, as a function of n^? 

2. What is the maximum value of n,, as a function of n^? 

3. If T" is a red-black tree that represents T, then how many red nodes does T" have? 

Exercise 9.3. Suppose you are given a binary search tree with n nodes and height at most 
21ogn - 2. Is it always possible to color the nodes red and black so that the tree satisfies 
the black-height and no-red-edge properties? If so, can it also be made to satisfy the left- 
leaning property? 
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h + 2 





h+2 





h+l 





Figure 9.10: Rebalancing in an AVL tree. At most 2 rotations are required to convert a 
node whose subtrees have height h and h + 2 into a node whose subtrees each have height 
at most h+l. 
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Exercise 9.4. Prove that, during an add(x) operation, an AVL tree must perform at most one 
rebalancing operation (that involves at most 2 rotations; see Figure 9.10). Give an example 
of an AVL tree and a remove(x) operation on that tree that requires on the order of logn 
rebalancing operations. 

Exercise 9.5. Implement an AVLTree class that implements AVL trees as described above. 
Compare its performance to that of the RedBlackTree implementation. Which implemen- 
tation has a faster f ind(x) operation? 

Exercise 9.6. Design and implement a series of experiments that compare the relative per- 
formance of f ind(x), add(x), and remove(x) for SkiplistSSet, ScapegoatTree, Treap, and 
RedBlackTree. Be sure to include multiple test scenarios, including cases where the data 
is random, already sorted, is removed in random order, is removed in sorted order, and so 
on. 
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Chapter 10 



Heaps 



In this chapter, we discuss two implementations of the extremely useful priority Queue 
data structure. Both of these structures are a special kind of binary tree called a heap, 
which means "a disorganized pile." This is in contrast to binary search trees that can 
thought of as a highly organized pile. 

The first heap implementation uses an array to simulate a complete binary tree. It 
is very fast and is the basis of one of the fastest known sorting algorithms, namely heapsort 
(see Section 11.1.3). The second implementation is based on more flexible binary trees. It 
supports a meld(h) operation that allows the priority queue to absorb the elements of a 
second priority queue h. 

10.1 BinaryHeap: An Implicit Binary Tree 

Our first implementation of a (priority) Queue is based on a technique that is over 400 

years old. Eytzinger's method allows us to represent a complete binary tree as an array. This 

is done by laying out the nodes of the tree in breadth-first order (see Section 6.1.2) in the 

array. In this way the root is stored at position 0, the root's left child is stored at position 

1, the root's right child at position 2, the left child of the left child of the root is stored at 

position 3, and so on. See Figure 10.1. 

If we do this for a large enough tree, some patterns emerge. The left child of the 

node at index i is at index lef t(i) = 2i + 1 and the right child of the node at index i is at 

index right(i) = 2i + 2. The parent of the node at index i is at index parent(i) = (i- l)/2. 

BinaryHeap 



int left(int i) { 
return 2*i + 1; 

} 

int right(int i) { 

return 2*i + 2; 
} 
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Figure 10.1: Eytzinger's method represents a complete binary tree as an array. 



int parent(int i) { 

return (i-1)/2; 
} 



A BinaryHeap uses this technique to implicitly represent a complete binary tree in 
which the elements are heap-ordered: The value stored at any index i is not smaller than 
the value stored at index parent(i), with the exception of the root value, 1 = 0. It follows 
that the smallest value in the priority Queue is therefore stored at position (the root). 

In a BinaryHeap, the n elements are stored in an array a: 
BinaryHeap 



T[] a; 
int n; 



Implementing the add(x) operation is fairly straightforward. As with all array- 
based structures, we first check if a is full (because a. length = n) and, if so, we grow a. 
Next, we place x at location a[n] and increment n. At this point, all that remains is to 
ensure that we maintain the heap property. We do this by repeatedly swapping x with its 

parent until x is no longer smaller than its parent. See Figure 10.2. 

BinaryHeap 



boolean add(T x) { 

if (n + 1 > a. length) resize(); 
a[n++] = x; 
bubbleUp(n-l); 
return true; 

} 

void bubbleUp(int i) { 
int p = parent (i) ; 
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while (i > && compare(a[i] , a[p]) < 0) { 
swap(i,p) ; 

i = p; 

p = parent (i) ; 

} 
} 



Implementing the remove() operation, which removes the smallest value from the 
heap, is a little trickier. We know where the smallest value is (at the root), but we need to 
replace it after we remove it and ensure that we maintain the heap property. 

The easiest way to do this is to replace the root with the value a[n - 1], delete that 
value, and decrement n. Unfortunately, the new root element is now probably not the 
smallest element, so it needs to be moved downwards. We do this by repeatedly compar- 
ing this element to its two children. If it is the smallest of the three then we are done. 
Otherwise, we swap this element with the smallest of its two children and continue. 
BinaryHeap 



T remove ( ) { 
T x = a[0]; 
a[0] = a[--n]; 
trickleDown(O) ; 
if (3*n < a. length) resize(); 
return x; 

} 

void trickleDown(int i) { 
do { 

int j = -1; 
int r = right (i ) ; 

if (r < n && compare (a [r] , a[i]) < 0) { 
int 1 = left(i); 
if (compare(a[l] , a[r]) < 0) { 

j = l; 

} else { 
j = r; 

} 
} else { 

int 1 = left(i) ; 

if (1 < n && compare(a[l], a[i]) < 0) { 
j = l; 

} 
} 
if ( j >= 0) swap(i, j) ; 

i = j; 
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Figure 10.2: Adding the value 6 to a BinaryHeap. 
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} while (i >= 0); 
} 



As with other array-based structures, we will ignore the time spent in calls to 
resizeQ, since these can be accounted for with the amortization argument from Lemma 2.1. 
The running times of both add(x) and remove() then depend on the height of the (implicit) 
binary tree. Luckily this is a complete binary tree; every level except the last has the maxi- 
mum possible number of nodes. Therefore, if the height of this tree is h, then it has at least 
2 h nodes. Stated another way 

n>2* . 

Taking logarithms on both sides of this equation gives 

h < log n . 
Therefore, both the add(x) and remove() operation run in O(logn) time. 
10.1.1 Summary 
The following theorem summarizes the performance of a BinaryHeap: 

Theorem 10.1. A BinaryHeap implements the (priority) Queue interface. Ignoring the cost of 
calls to resizeQ, a BinaryHeap supports the operations add(x) and remove() in O(logn) time 
per operation. 

Furthermore, beginning with an empty BinaryHeap, any sequence of m add(x) and 
remove() operations results in a total ofO(m) time spent during all calls to resizeQ. 

10.2 MeldableHeap: A Randomized Meldable Heap 

In this section, we describe the MeldableHeap, a priority Queue implementation in which 
the underlying structure is also a heap-ordered binary tree. However, unlike a BinaryHeap 
in which the underlying binary tree is completely defined by the number of elements, there 
are no restrictions on the shape of the binary tree that underlies a MeldableHeap; anything 
goes. 

The add(x) and removeQ operations in a MeldableHeap are implemented in terms 
of the merge(h1,h2) operation. This operation takes two heap nodes hi and h2 and merges 
them, returning a heap node that is the root of a heap that contains all elements in the 
subtree rooted at hi and all elements in the subtree rooted at h2. 

The nice thing about a merge(h1,h2) operation is that it can be defined recursively. 
If either of hi or h2 is nil, then we are merging with an empty set, so we return h2 or 
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Figure 10.3: Removing the minimum value, 4, from a BinaryHeap. 
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hi, respectively. Otherwise, assume hl.x < h2.x since, if hl.x > h2.x, then we can reverse 

the roles of hi and h2. Then we know that the root of the merged heap will contain hl.x 

and we can recursively merge h2 with hl.left or hi. right, as we wish. This is where 

randomization comes in, and we toss a coin to decide whether to merge h2 with hl.left 

or hi. right: 

MeldableHeap 



Node<T> merge(Node<T> hi, Node<T> h2) { 
if (hi == nil) return h2; 
if (h2 == nil) return hi; 

if (compare(h2. x, hl.x) < 0) return merge(h2, hi); 
// now we know hl.x <= h2.x 
if (rand. nextBoolean( ) ) { 

hl.left = merge (hi .left, h2); 

hi .left .parent = hi; 
} else { 

hi. right = merge(h1 .right , h2); 

hi .right .parent = hi; 

} 

return hi; 



In the next section, we show that merge(h1,h2) runs in O(logn) expected time, 
where n is the total number of elements in hi and h2. 

With access to a merge(h1,h2) operation, the add(x) operation is easy. We create a 
new node u containing x and then merge u with the root of our heap: 



b 


Dolean add(T 


x) { 






Node<T> u = 


newNode 


0; 




u . x = x; 








r = merge (u 


r); 






r. parent = nil; 






n++; 








return true 






} 









This takes 0(log(n + 1)) = O(logn) expected time. 

The remove() operation is similarly easy. The node we want to remove is the root, 

so we just merge its two children and make the result the root: 

MeldableHeap 



T remove ( ) { 
T x = r. x; 
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r = merge(r.lef t , r. right); 
if (r != nil) r. parent = nil; 
n--; 

return x; 
} 



Again, this takes O(logn) expected time. 

Additionally, a MeldableHeap can implement many other operations in O(logn) 
expected time, including: 

• remove(u): remove the node u (and its key u.x) from the heap. 

• absorb(h): add all the elements of the MeldableHeap h to this heap, emptying h in 
the process. 

Each of these operations can be implemented using a constant number of merge(h1,h2) 
operations that each take O(logn) time. 

10.2.1 Analysis of merge(h1,h2) 

The analysis of merge(h1,h2) is based on the analysis of a random walk in a binary tree. 
A random walk in a binary tree is a walk that starts at the root of the tree. At each step 
in the walk, a coin is tossed and the walk proceeds to the left or right child of the current 
node depending on the result of this coin toss. The walk ends when it falls off the tree (the 
current node becomes nil). 

The following lemma is somewhat remarkable because it does not depend at all on 
the shape of the binary tree: 

Lemma 10.1. The expected length of a random walk in a binary tree with n nodes is at most 
log(n+1). 

Proof. The proof is by induction on n. In the base case, n = and the walk has length 
= log(n + 1). Suppose now that the result is true for all non-negative integers n' < n. 

Let r^ denote the size of the root's left subtree, so that n 2 = n-n 1 -lis the size of 
the root's right subtree. Starting at the root, the walk takes one step and then continues in 
a subtree of size nj or continues in a subtree of size n 2 . By our inductive hypothesis, the 
expected length of the walk is then 

E[W] = l + ilog(n 1 + l)+ilog(n 2 +l) , 
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since each of rij and n 2 are less than n. Since log is a concave function, E[W] is maximized 
when ni = n 2 = (n - l)/2. Therefore, the expected number of steps taken by the random 

walk is 

E[W] = l + ilog(n 1 + l)+ilog(n 2 + l) 

<l + log((n-l)/2+l) 

= l+log((n + l)/2) 

= log(n + l) . □ 

We make a quick digression to note that, for readers who know a little about infor- 
mation theory, the proof of Lemma 10.1 can be stated in terms of entropy 

Information Theoretic Proof of Lemma 10.1. Let d,- denote the depth of the z'th external node 
and recall that a binary tree with n nodes has n + 1 external nodes. The probability of the 
random walk reaching the z'th external node is exactly p, = l/2 rfi , so the expected length of 
the random walk is given by 

n n n 

H = J^Pidi = J^ Pi log(2 d <) = J^ Pi log(l/ Pi ) 

i'=0 (=0 !=0 

The right hand side of this equation is easily recognizable as the entropy of a probability 
distribution over n + 1 elements. A basic fact about the entropy of a distribution over n + 1 
elements is that it does not exceed log(n + 1), which proves the lemma. □ 

With this result on random walks, we can now easily prove that the running time 
of the merge(h 1, h2) operation is 0(log n). 

Lemma 10.2. I/h1 and h2 are the roots of two heaps containing rij and n 2 nodes, respectively, 
then the expected running time o/merge(h1,h2) is at most O(logn), where n = x\\ + n 2 . 

Proof. Each step of the merge algorithm takes one step of a random walk, either in the heap 
rooted at hi or the heap rooted at h2 The algorithm terminates when either of these two 
random walks fall out of its corresponding tree (when h 1 = null or h2 = null). Therefore, 
the expected number of steps performed by the merge algorithm is at most 

log(n 1 + l) + log(n 2 + 1) < 21ogn . □ 



187 



1 0. Heaps 1 0.3. Discussion and Exercises 

10.2.2 Summary 

The following theorem summarizes the performance of a MeldableHeap: 

Theorem 10.2. A MeldableHeap implements the (priority) Queue interface. A MeldableHeap 
supports the operations add(x) and remove() in O(logn) expected time per operation. 

10.3 Discussion and Exercises 

The implicit representation of a complete binary tree as an array or list, seems to have 
been first proposed by Eytzinger [25], as a representation for pedigree family trees. The 
BinaryHeap data structure described here was first introduced by Williams [73]. 

The randomized MeldableHeap data structure described here appears to have first 
been proposed by Gambin and Malinowski [32]. Other meldable heap implementations 
exist, including leftist heaps [15, 46, Section 5.3.2], binomial heaps [70], Fibonacci heaps 
[30], pairing heaps [29], and skew heaps [68], although none of these are as simple as the 
MeldableHeap structure. 

Some of the above structures also support a decreaseKey(u, y) operation in which 
the value stored at node u is decreased to y. (It is a pre-condition that y < u.x.) This 
operation can be implemented in 0(log n) time in most of the above structures by removing 
node u and adding y. However, some of these structures can implement decreaseKey(u, y) 
more efficiently. In particular, decreaseKey(u, y) takes 0(1) amortized time in Fibonacci 
heaps and O(loglogn) amortized time in a special version of pairing heaps [23]. This 
more efficient decreaseKey(u,y) operation has applications in speeding up several graph 
algorithms including Dijkstra's shortest path algorithm [30]. 



Chapter 11 



Sorting Algorithms 



This chapter discusses algorithms for sorting a set of n items. This might seem like a 
strange topic for a book on data structures, but there are several good reasons for including 
it here. The most obvious reason is that two of these sorting algorithms (quicksort and 
heap-sort) are intimately related to two of the data structures we have already studied 
(random binary search trees and heaps, respectively). 

The first part of this chapter discusses algorithms that sort using only comparisons 
and presents three algorithms that run in O(nlogn) time. As it turns out, all three al- 
gorithms are asymptotically optimal; no algorithm that uses only comparisons can avoid 
doing roughly nlogn comparisons in the worst-case and even the average-case. 

Before continuing, we should note that any of the SSet or priority Queue imple- 
mentations presented in previous chapters can also be used to obtain an O(nlogn) time 
sorting algorithm. For example, we can sort n items by performing n add(x) operations 
followed by n remove() operations on a BinaryHeap or MeldableHeap. Alternatively, we 
can use n add(x) operations on any of the binary search tree data structures and then per- 
form an in-order traversal (Exercise 6.8) to extract the elements in sorted order. However, 
in both cases we go through a lot of overhead to build a structure that is never fully used. 
Sorting is such an important problem that it is worthwhile developing direct methods that 
are as fast, simple, and space-efficient as possible. 

The second part of this chapter shows that, if we allow other operations besides 
comparisons, then all bets are off. Indeed, by using array-indexing, it is possible to sort a 
set of n integers in the range {0, ...,n c - 1} in O(cn) time. 
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11.1 Comparison-Based Sorting 

In this section, we present three sorting algorithms: merge-sort, quicksort, and heap-sort. 
All these algorithms take an input array a and sort the elements of a into non-decreasing 
order in O(nlogn) (expected) time. These algorithms are all comparison-based. Their sec- 
ond argument, c, is a Comparator that implements the compare(a, b) method. These algo- 
rithms don't care what type of data is being sorted, the only operation they do on the data is 
comparisons using the compare(a, b) method. Recall, from Section 1.1.4, that compare(a, b) 
returns a negative value if a < b, a positive value if a > b, and zero if a = b. 

11.1.1 Merge-Sort 

The merge-sort algorithm is a classic example of recursive divide and conquer: If the length 
of a is at most 1, then a is already sorted, so we do nothing. Otherwise, we split a into two 
halves, aO = a[0],...,a[n/2- 1] and a1 = a[n/2], ...,a[n - 1]. We recursively sort aO and a1, 
and then we merge (the now sorted) aO and a1 to get our fully sorted array a: 

Algorithms 



<T> void mergeSort (T[ ] a, Comparator<T> c) { 

if (a. length <= 1) return; 

T[] aO = Arrays . copyOfRange(a, 0, a. length/2); 

T[] a1 = Arrays . copyOfRange( a, a. length/2, a. length); 

mergeSort (aO, c) ; 

mergeSort (a1 , c) ; 

merge(a0, a1 , a, c) ; 
} 



An example is shown in Figure 11.1. 

Compared to sorting, merging the two sorted arrays aO and a1 is fairly easy. We 

add elements to a one at a time. If aO or a1 is empty we add the next elements from the 

other (non-empty) array. Otherwise, we take the minimum of the next element in aO and 

the next element in a1 and add it to a: 

Algorithms 



<T> void merge(T[] aO, T[] a1, T[] a, Comparator<T> c) { 
int iO = 0, i1 = 0; 

for (int i = 0; i < a. length; i++) { 
if (10 == aO. length) 

a[ i] = a1 [i1++] ; 
else if (i1 == a1. length) 

a[i] = a0[i0++]; 
else if (compare(aO[iO] , a1[i1]) < 0) 
a[i] = a0[i0++]; 
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13 8 5 2 4 6 9 7 3 12 1 10 11 




aO 13 8 5 2 4 6 



mergeSort(aO, c) 



aO 2 4 5 6 8 13 



7 3 12 1 10 11 a1 



mergeSort(a1,c) 



1 3 7 9 10 11 12 a1 




a 1 2 3 4 5 6 7 8 9 10 11 12 13 



Figure 11.1: The execution of mergeSort(a, c) 



else 

a[l] = a1 [i1++] ; 



} 



Notice that the merge(a0, a1,a, c) algorithm performs at most n- 1 comparisons before 
running out of elements in one of aO or a 1 . 

To understand the running-time of merge-sort, it is easiest to think of it in terms 
of its recursion tree. Suppose for now that n is a power of 2, so that n = 2 lo S n , and logn is 
an integer. Refer to Figure 11.2. Merge-sort turns the problem of sorting n elements into 
2 problems, each of sorting n/2 elements. These two subproblem are then turned into 2 
problems each, for a total of 4 subproblems, each of size n/4. These 4 subproblems become 
8 subproblems, each of size n/8, and so on. At the bottom of this process, n/2 subproblems, 
each of size 2, are converted into n problems, each of size 1 . For each subproblem of size 
n/2 ! , the time spent merging and copying data is 0(n/2 ! ). Since there are 2' subproblems 
of size n/2 ! , the total time spent working on problems of size 2 ! , not counting recursive 
calls, is 

2 ! ' x 0(n/2 ! ) = O(n) . 

Therefore, the total amount of time taken by merge-sort is 

logn 

^0(n) = 0(nlogn) . 
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= n 



= n 



1 + 1 + 1 + 1 + 1 + 1+ ■■■ +1 + 1 + 1 + 1 + 1 + 1 =n 
Figure 11.2: The merge-sort recursion tree. 

The proof of the following theorem is based on the same analysis as above, but has 
to be a little more careful to deal with the cases where n is not a power of 2. 

Theorem 11.1. The mergeSort(a, c) algorithm runs in O(nlogn) time and performs at most 
n log n comparisons. 

Proof. The proof is by induction on n. The base case, in which n = 1, is trivial; when 
presented with an array of length or 1 the algorithm simply returns without performing 
any comparisons. 

Merging two sorted lists of total length n requires at most n - 1 comparisons. Let 
C(n) denote the maximum number of comparisons performed by mergeSort(a, c) on an 
array a of length n. If n is even, then we apply the inductive hypothesis to the two sub- 
problems and obtain 

C(n)<n-l + 2C(n/2) 

<n-l + 2((n/2)log(n/2)) 
= n-l+nlog(n/2) 
= n - 1 + n log n - n 
< nlosn . 
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The case where n is odd is slightly more complicated. For this case, we use two inequalities, 
that are easy to verify: 

log(x+l)<log(x) + l , (11.1) 

for all x > 1 and 

log(x+l/2) + log(x-l/2)<21og(x) , (11.2) 

for all x > 1/2. Inequality (11.1) comes from the fact that log(x) + 1 = log(2x) while (11.2) 
follows from the fact that log is a concave function. With these tools in hand we have, for 
odd n, 

C(n) < n - 1 + C(["n/2~|) + C(|_n/2J) 

< n - 1 + fn/21 logrn/21 + [n/2\ logLn/2J 

= n - 1 + (n/2 + l/2)log(n/2 + 1/2) + (n/2 - l/2)log(n/2 - 1/2) 

< n - 1 + n log(n/2) + (l/2)(log(n/2 + 1/2) - log(n/2 - 1/2)) 

< n-l + nlog(n/2)+l/2 

< n + nlog(n/2) 
= n + n(logn - 1) 

= nlogn . □ 

11.1.2 Quicksort 

The quicksort algorithm is another classic divide and conquer algorithm. Unlike merge- 
sort, which does merging after solving the two subproblems, quicksort does all its work 
upfront. 

The algorithm is simple to describe: Pick a random pivot element, x, from a; par- 
tition a into the set of elements less than x, the set of elements equal to x, and the set of 
elements greater than x; and, finally, recursively sort the first and third sets in this parti- 
tion. An example is shown in Figure 11.3. 

Algorithms 



<T> void quickSort (T[ ] a, Comparator<T> c) { 
quickSort (a, 0, a. length, c); 

} 

<T> void quickSort (T[ ] a, int i, int n, Comparator<T> c) { 

if (n <= 1) return; 

T x = a[i + rand. nextlnt (n) ] ; 

int p = i-1, j = i, q = i+n; 

// a[i..p]<x, a[ p+1 . .q-1 ]??x, a[q. . i+n-1 ]>x 
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Figure 11.3: An example execution of quickSort(a, 0, 14,c) 



while (j < q) { 

int comp = compare(a[ j ] , x); 

if (comp < 0) { // move to beginning of array 

swap(a, j++, ++p) ; 
} else if (comp > 0) { 

swap(a, j, --q); // move to end of array 
} else { 

j++; // keep in the middle 

} 
} 

// a[i..p]<x, a[p+1 . . q-1 ]=x , a[q. . i+n-1 ]>x 
quickSort(a, i, p-i+1, c); 
quickSort(a, q, n-(q-i), c); 



} 



All of this is done in-place, so that instead of making copies of subarrays being sorted, 
the quickSort(a, i,n, c) method only sorts the subarray a[i], ...,a[i + n - 1]. Initially this 
method is called as quickSort(a,0,a.length,c). 

At the heart of the quicksort algorithm is the in-place partitioning algorithm. This 
algorithm, without using any extra space, swaps elements in a and computes indices p and 
q so that 

<x if < i < p 
ll<=x ifp<i<q 

>x ifq<i<n-l 

This partitioning, which is done by the while loop in the code, works by iteratively in- 
creasing p and decreasing q while maintaining the first and last of these conditions. At 
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each step, the element at position j is either moved to the front, left where it is, or moved 
to the back. In the first two cases, j is incremented, while in the last case, j is not incre- 
mented since the new element at position j has not been processed yet. 

Quicksort is very closely related to the random binary search trees studied in Sec- 
tion 7.1. In fact, if the input to quicksort consists of n distinct elements, then the quicksort 
recursion tree is a random binary search tree. To see this, recall that when constructing a 
random binary search tree the first thing we do is pick a random element x and make it the 
root of the tree. After this, every element will eventually be compared to x, with smaller 
elements going into the left subtree and larger elements going into the right subtree. 

In quicksort, we select a random element x and immediately compare everything 
to x, putting the smaller elements at the beginning of the array and larger elements at the 
end of the array. Quicksort then recursively sorts the beginning of the array and the end 
of the array while the random binary search tree recursively inserts smaller elements in 
the left subtree of the root and larger elements in the right subtree of the root. 

The above correspondence between random binary search trees and quicksort means 
that we can translate Lemma 7.1 to a statement about quicksort: 

Lemma 11.1. When quicksort is called to sort an array containing the integers 0,...,n- 1, the 
expected number of times element i is compared to a pivot element is at most H 1+1 + H n _ i . 

A little summing of harmonic numbers gives us the following theorem about the 
running time of quicksort: 

Theorem 11.2. When quicksort is called to sort an array containing n distinct elements, the 
expected number of comparisons performed is at most 2nlnn + O(n). 

Proof. Let T be the number of comparisons performed by quicksort when sorting n distinct 
elements. Using Lemma 11.1, we have: 

n-l 



[T] 


1=0 

n 

1 = 1 

n 

;=i 


+ Hn-i 



<2nlnn + 2n = 2nlnn + 0(n) □ 
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Theorem 11.3 describes the case where the elements being sorted are all distinct. 
When the input array, a, contains duplicate elements, the expected running time of quick- 
sort is no worse, and can be even better; any time a duplicate element x is chosen as a 
pivot, all occurrences of x get grouped together and don't take part in either of the two 
subproblems. 

Theorem 11.3. The quickSort(a, c) method runs in O(nlogn) expected time and the expected 
number of comparisons it performs is at most 2nlnn + O(n). 

11.1.3 Heap-sort 

The heap-sort algorithm is another in-place sorting algorithm. Heap-sort uses the binary 
heaps discussed in Section 10.1. Recall that the BinaryHeap data structure represents a 
heap using a single array. The heap-sort algorithm converts the input array a into a heap 
and then repeatedly extracts the minimum value. 

More specifically, a heap stores n elements at array locations a[0],...,a[n - 1] with 
the smallest value stored at the root, a[0]. After transforming a into a BinaryHeap, the 
heap-sort algorithm repeatedly swaps a[0] and a[n - 1], decrements n, and calls trickleDown(O) 
so that a[0],...,a[n - 2] once again are a valid heap representation. When this process ends 
(because n = 0) the elements of a are stored in decreasing order, so a is reversed to obtain 
the final sorted order. 1 Figure 11.1.3 shows an example of the execution of heapSort(a, c). 
BinaryHeap 



<T> void sort(T[] a, Comparator<T> c) { 

BinaryHeap<T> h = new BinaryHeap<T>(a, c); 
while (h.n > 1) { 

h.swap(--h . n, 0) ; 

h . trickleDown(O) ; 

} 

Collections .reverse (Arrays .asList(a) ) ; 

} 



A key subroutine in heap sort is the constructor for turning an unsorted array 
a into a heap. It would be easy to do this in O(nlogn) time by repeatedly calling the 
BinaryHeap add(x) method, but we can do better by using a bottom-up algorithm. Recall 
that, in a binary heap, the children of a[i] are stored at positions a[2i + 1] and a[2i + 2]. 
This implies that the elements a[|_n/2J],...,a[n - 1] have no children. In other words, each 



The algorithm could alternatively redefine the compare(x, y) function so that the heap sort algorithm stores 
the elements directly in ascending order. 
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Figure 11.4: A snapshot of the execution of heapSort(a, c). The shaded part of the array is 
already sorted. The unshaded part is a BinaryHeap. During the next iteration, element 5 
will be placed into array location 8. 



of a[|_n/2J],...,a[n - 1] is a sub-heap of size 1. Now, working backwards, we can call 

trickleDown(i) for each i 6 {|_n/2J - 1,...,0}. This works, because by the time we call 

trickleDown(i), each of the two children of a[i] are the root of a sub-heap so calling 

trlckleDown(i) makes a[i] into the root of its own subheap. 

BinaryHeap 



BinaryHeap(T[ ] a, Comparator<T> c) { 
this . c = c; 
this .a = a; 
n = a. length; 

for (int i = n/2-1; i >= 0; i— ) { 
trickleDown(i) ; 



} 



} 



The interesting thing about this bottom-up strategy is that it is more efficient than 
calling add(x) n times. To see this, notice that, for n/2 elements, we do no work at all, for 
n/4 elements, we call trickleDown(i) on a subheap rooted at a[i] and whose height is 1, 
for n/8 elements, we call trickleDown(i) on a subheap whose height is 2, and so on. Since 
the work done by trickleDown(i) is proportional to the height of the sub-heap rooted at 
a[i], this means that the total work done is at most 

log n oo co 

J^ 0((i - l)n/2') < J^O(w/2 j ) = O(n) Ti/2 1 = 0(2n) = O(n) . 



/=i 



(=1 



i=i 



The second-last equality follows by recognizing that the sum Y4L1 */2 ! is equal, by defini- 
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tion, to the expected number times we toss a coin up to and including the first time the 
coin comes up as heads and applying Lemma 4.2. 

The following theorem describes the performance of heapSort(a, c). 

Theorem 11.4. The heapSort(a, c) method runs in O(nlogn) time and performs at most 2nlogn+ 
O(n) comparisons. 

Proof. The algorithm runs in 3 steps: (1) Transforming a into a heap, (2) repeatedly ex- 
tracting the minimum element from a, and (3) reversing the elements in a. We have just 
argued that step 1 takes O(n) time and performs O(n) comparisons. Step 3 takes O(n) time 
and performs no comparisons. Step 2 performs n calls to trickleDown(O). The ith such 
call operates on a heap of size n-i and performs at most 21og(n-z') comparisons. Summing 

this over i gives 

n-i n-i 

} 21og(n-z)< > 21ogn = 2nlogn 

i=0 i=0 

Adding the number of comparisons performed in each of the three steps completes the 

proof. □ 

11.1.4 A Lower-Bound for Comparison-Based Sorting 

We have now seen three comparison-based sorting algorithms that each run in O(nlogn) 
time. By now, we should be wondering if faster algorithms exist. The short answer to this 
question is no. If the only operations allowed on the elements of a are comparisons then 
no algorithm can avoid doing roughly nlogn comparisons. This is not difficult to prove, 
but requires a little imagination. Ultimately it follows from the fact that 

log(n!) = logn + log(n- 1) + hlog(l) = nlogn - O(n) . 

(Proving this fact is left as Exercise 1 1.5.) 

We will first focus our attention on deterministic algorithms like merge-sort and 
heap-sort and on a particular fixed value of n. Imagine such an algorithm is being used 
to sort n distinct elements. The key to proving the lower-bound is to observe that, for a 
deterministic algorithm with a fixed value of n, the first pair of elements that are com- 
pared is always the same. For example, in heapSort(a, c), when n is even, the first call to 
trickleDown(i) is with i = n/2 - 1 and the first comparison is between elements a[n/2 - 1 ] 
and a[n - 1]. 

Since all input elements are distinct, this first comparison has only two possible 
outcomes. The second comparison done by the algorithm may depend on the outcome of 
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a[1]£a[2] 



Figure 11.5: A comparison tree for sorting an array a[0],a[1],a[2] of length n = 3. 

the first comparison. The third comparison may depend on the results of the first two, and 
so on. In this way any deterministic comparison-based sorting algorithm can be viewed 
as a rooted binary comparison-tree. Each internal node, u, of this tree is labelled with a 
pair of indices u.i and u.j. If a[u.i] < a[u.j] the algorithm proceeds to the left subtree, 
otherwise it proceeds to the right subtree. Each leaf w of this tree is labelled with a permu- 
tation w.p[0], ...,w.p[n - 1] of 0,...,n - 1. This permutation represents the permutation that 
is required to sort a if the comparison tree reaches this leaf. That is, 

a[w.p[0]] < a[w.p[1]] < ••• < a[w.p[n- 1]] . 

An example of a comparison tree for an array of size n = 3 is shown in Figure 11.5. 

The comparison tree for a sorting algorithm tells us everything about the algo- 
rithm. It tells us exactly the sequence of comparisons that will be performed for any input 
array, a, having n distinct elements and it tells us how the algorithm will reorder a to sort 
it. An immediate consequence of this is that the comparison tree must have at least n! 
leaves; if not, then there are two distinct permutations that lead to the same leaf, so the 
algorithm does not correctly sort at least one of these permutations. 

For example, the comparison tree in Figure 1 1.6 has only 4 < 3! = 6 leaves. Inspect- 
ing this tree, we see that the two input arrays 3,1,2 and 3,2,1 both lead to the rightmost 
leaf. On the input 3, 1, 2 this leaf correctly outputs a[ 1 ] = 1, a[2] = 2, a[0] = 3. However, on 
the input 3,2,1, this node incorrectly outputs a[1] = 2,a[2] = l,a[0] = 3. This discussion 
leads to the primary lower-bound for comparison-based algorithms. 

Theorem 11.5. For any deterministic comparison-based sorting algorithm A and any integer 
n > 1, there exists an input array a of length n such that A performs at least log(n!) = nlogn - 
O(n) comparisons when sorting a. 
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a[1]£a[2] a[0]£a[2] 



a[0]<a[1]<a[2] a[0] < a[2] < a[1] a[1] < a[0] < a[2] a[1] < a[2] < a[0] 

Figure 11.6: A comparison tree that does not correctly sort every input permutation. 

Proof. By the above discussion, the comparison tree defined by A must have at least n! 
leaves. An easy inductive proof shows that any binary tree with k leaves has height at least 
log A:. Therefore, the comparison tree for A has a leaf, w, of depth at least log(n!) and there 
is an input array a that leads to this leaf. The input array a is an input for which A does at 
least log(n!) comparisons. □ 



Theorem 11.5 deals with deterministic algorithms like merge-sort and heap-sort, 
but doesn't tell us anything about randomized algorithms like quicksort. Could a random- 
ized algorithm beat the log(n!) lower bound on the number of comparisons? The answer, 
again, is no. Again, the way to prove it is to think differently about what a randomized 
algorithm is. 

In the following discussion, we will implicitly assume that our decision trees have 
been "cleaned up" in the following way: Any node that can not be reached by some input 
array a is removed. This cleaning up implies that the tree has exactly n! leaves. It has at 
least n! leaves because, otherwise, it could not sort correctly. It has at most n! leaves since 
each of the possible n! permutation of n distinct elements follows exactly one root to leaf 
path in the decision tree. 

We can think of a randomized sorting algorithm 1Z as a deterministic algorithm 
that takes two inputs: The input array a that should be sorted and a long sequence b = 
b\,b2,b$,...,b m of random real numbers in the range [0,1]. The random numbers provide 
the randomization. When the algorithm wants to toss a coin or make a random choice, it 
does so by using some element from b. For example, to compute the index of the first pivot 
in quicksort, the algorithm could use the formula \nbi\. 

Now, notice that if we fix b to some particular sequence b then 1Z becomes a de- 
terministic sorting algorithm, lZ(b), that has an associated comparison tree, T(b). Next, 
notice that if we select a to be a random permutation of { 1, . . . , n}, then this is equivalent to 
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selecting a random leaf, w, from the n! leaves of T(b). 

Exercise 1 1.7 asks you to prove that, if we select a random leaf from any binary tree 
with k leaves, then the expected depth of that leaf is at least log/c. Therefore, the expected 
number of comparisons performed by the (deterministic) algorithm lZ(b) when given an 
input array containing a random permutation of \\,...,n) is at least log(n!). Finally, notice 
that this is true for every choice of b, therefore it holds even for 1Z. This completes the 
proof of the lower-bound for randomized algorithms. 

Theorem 11.6. For any (deterministic or randomized) comparison-based sorting algorithm A 
and any integer n > 1, the expected number of comparisons done by A when sorting a random 
permutation of{l,...,n) is at least log(n!) = nlogn - O(n). 

11.2 Counting Sort and Radix Sort 

In this section we consider two sorting algorithms that are not comparison-based. These 
algorithms are specialized for sorting small integers. These algorithms get around the 
lower-bounds of Theorem 11.5 by using (parts of) the elements of a as indices into an 
array. Consider a statement of the form 

c[a[l]] = l . 

This statement executes in constant time, but has c. length possible different outcomes, 
depending on the value of a[i]. This means that the execution of an algorithm that makes 
such a statement can not be modelled as a binary tree. Ultimately, this is the reason that 
the algorithms in this section are able to sort faster than comparison-based algorithms. 

11.2.1 Counting Sort 

Suppose we have an input array a consisting of n integers, each in the range 0, . . . , k - 1 . The 
counting-sort algorithm sorts a using an auxiliary array c of counters. It outputs a sorted 
version of a as an auxiliary array b. 

The idea behind counting-sort is simple: For each i 6 {0, ...,k-l}, count the number 

of occurrences of i in a and store this in c[i]. Now, after sorting, the output will look like 

c[0] occurrences of 0, followed by c[1] occurrences of 1, followed by c[2] occurrences of 

2,. . . , followed by c [k - 1 ] occurrences of k - 1 . The code that does this is very slick, and its 

execution is illustrated in Figure 1 1.7: 

Algorithms 



int[] countingSort ( int [ ] a, int k) { 
int c[ ] = new int [k] ; 
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Figure 11.7: The operation of counting sort on an array of length n = 20 that stores integers 
0,...,k-l = 9. 



for (int i = 0; i < a. length; 


i++) 


c[a[i]]++; 




for (int i = 1; i < k; i++) 




c[i] += c[i-1]; 




int b[] = new int [a. length] ; 




for (int i = a.length-1; i >= 


0; i-) 


b[-c[a[i]]] = a[i]; 




return b; 




} 





The first for loop in this code sets each counter c[i] so that it counts the number 
of occurrences of i in a. By using the values of a as indices, these counters can all be 
computed in O(n) time with a single for loop. At this point, we could use c to fill in the 
output array b directly. However, this would not work if the elements of a have associated 
data. Therefore we spend a little extra effort to copy the elements of a into b. 

The next for loop, which takes O(k) time, computes a running-sum of the counters 
so that c[i] becomes the number of elements in a that are less than or equal to i. In 
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particular, for every i e {0, . . . , k - 1 }, the output array, b, will have 

b[c[i - 1]] = b[c[i- 1]+ 1] =••• = b[c[i]- 1] = i . 

Finally the algorithm scans a backwards to put its elements in order into an output array 
b. When scanning, the element a[i] = j is placed at location b[c[ j] - 1] and the value c[ j] 
is decremented. 

Theorem 11.7. The countingSort(a, k) method can sort an array a containing n integers in 
the set {0, ...,k - 1} in 0(n + k) time. 

The counting-sort algorithm has the nice property of being stable; it preserves the 
relative order of elements that are equal. If two elements a[i] and a[ j ] have the same value, 
and i < j then a[i] will appear before a[ j] in b. This will be useful in the next section. 

11.2.2 Radix-Sort 

Counting-sort is very efficient for sorting an array of integers when the length, n, of the 
array is not much smaller than the maximum value, k - 1, that appears in the array. The 
radix-sort algorithm, which we now describe, uses several passes of counting-sort to allow 
for a much greater range of maximum values. 

Radix-sort sorts w-bit integers by using w/d passes of counting-sort to sort these 

integers d bits at a time. 2 More precisely radix sort first sorts the integers by their least 

significant d bits, then their next significant d bits, and so on until, in the last pass, the 

integers are sorted by their most significant d bits. 

Algorithms 



int[] radixSort(int [ ] a) { 
int[ ] b = null; 

for (int p = 0; p < w/d; p++) { 
int c[] = new int[1<<d]; 

// the next three for loops implement counting-sort 
b = new int [a. length] ; 
for (int i = 0; i < a. length; i++) 

c[(a[i] » d*p)&((1«d)-1)]++; 
for (int i = 1; i < 1<<d; i++) 

c[i] += c[i-1]; 
for (int i = a.length-1; i >= 0; i--) 

b[-c[(a[i] » d*p)&((1«d)-1)]] = a[i]; 
a = b; 
} 



-We assume that d divides w, otherwise we can always increase w to dfw/d]. 
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Figure 1 1.8: Using radixsort to sort w = 8-bit integers by using 4 passes of counting sort on 
d = 2-bit integers. 



return b; 



} 



(In this code, the expression (a[i]>>d *p)&((1<<d) - 1) extracts the integer whose binary 
representation is given by bits (p + l)d- l,...,pd of a[i].) An example of the steps of this 
algorithm is shown in Figure 11.8. 

This remarkable algorithm sorts correctly because counting-sort is a stable sorting 
algorithm. If x < y are two elements of a and the most significant bit at which x differs 
from y has index r, then x will be placed before y during pass |_r/dj and subsequent passes 
will not change the relative order of x and y. 

Radix-sort performs w/d passes of counting-sort. Each pass requires 0(n + 2 d ) time. 
Therefore, the performance of radix-sort is given by the following theorem. 

Theorem 11.8. For any integer d > 0, the radixSort(a, k) method can sort an array a contain- 
ing n \N-bit integers in 0((w/d)(n + 2 d )) time. 

If we think, instead, of the elements of the array being in the range {0, ...,n c - 1}, 
and take d = flog n"| we obtain the following version of Theorem 11.8. 

Corollary 11.1. The radixSort(a, k) method can sort an array a containing n integer values 
in the range {0,...,n c - 1} in O(cn) time. 

11.3 Discussion and Exercises 

Sorting is probably the fundamental algorithmic problem in computer science, and has 
a long history. Knuth [46] attributes the merge-sort algorithm to von Neumann (1945). 
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Quicksort is due to Hoare [37]. The original heap-sort algorithm is due to Williams [73], 
but the version presented here (in which the heap is constructed bottom-up in O(n) time) 
is due to Floyd [26]. Lower-bounds for comparison-based sorting appear to be folklore. 
The following table summarizes the performance of these comparison-based algorithms: 





comparisons 


in-place 


Merge-sort 

Quicksort 

Heap-sort 


nlogn worst-case 
1.38nlogn + O(n) expected 
2n log n + O(n) worst-case 


No 
Yes 
Yes 



Each of these comparison-based algorithms has advantages and disadvantages. 
Merge-sort does the fewest comparisons and does not rely on randomization. Unfortu- 
nately it uses an auxilliary array during its merge phase. Allocating this array can be 
expensive and is a potential point of failure if memory is limited. Quicksort is an in-place 
algorithm and is a close second in terms of the number of comparisons, but is randomized 
so this running time is not always guaranteed. Heap-sort does the most comparisons, but 
it is in-place and deterministic. 

There is one setting in which merge-sort is a clear-winner; this occurs when sorting 
a linked-list. In this case, the auxiliary array is not needed; two sorted linked lists are very 
easily merged into a single sorted linked-list by pointer manipulations (see Exercise 11.1). 

The counting-sort and radix-sort algorithms described here are due to Seward [66, 
Section 2.4.6]. However, variants of radix-sort have been used since the 1920's to sort 
punch cards using punched card sorting machines. These machines can sort a stack of 
cards into two piles based on the existence (or not) of a hole in a specific location on the 
card. Repeating this process for different hole locations gives an implementation of radix- 
sort. 

Exercise 11.1. Implement a version of the merge-sort algorithm that sorts a DLList without 
using an auxiliary array. 

Exercise 11.2. Analyze the expected number of comparisons done by Quicksort a little 
more carefully than the proof of Theorem 11.3. In particular, show that the expected 
number of comparisons is 2nH n - n + H n . 

Exercise 11.3. The heap sort implementation described here sorts the elements into reverse 
sorted order and then reverses the array. This could be avoided by defining a new Com- 
parator that negates the results of the input Comparator, c. Explain why this would not 
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be a good optimization. (Hint: Consider how many negations would needed to be done in 
relation to how long it takes to reverse the output.) 

Exercise 11.4. Find another pair of permutations of 1,2,3 that are not correctly sorted by 
the comparison-tree in Figure 11.6. 

Exercise 11.5. Prove that logn! = nlogn - O(n). 

Exercise 11.6. Prove that a binary tree with k leaves has height at least log A:. 

Exercise 11.7. Prove that, if we pick a random leaf from a binary tree with k leaves, then 
the expected height of this leaf is at least log k. 
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Graphs 



In this chapter, we study two representations of graphs and basic algorithms on these 
representations. 

Mathematically, a (directed) graph is a pair G = (V,E) where V is a set of vertices and 
£ is a set of ordered pairs of vertices called edges. An edge (i, j) is directed from i to j; i is 
called the source of the edge and j is called the target. A path in G is a sequence of vertices 
VQ,...,Vk such that, for every i e {l,...,k}, the edge (Vj_i,V;) is in E. A path Vq,...,v^ is a 
cycle if, additionally, the edge (v^, v ) is in E. A path (or cycle) is simple if all of its vertices 
are unique. If there is a path from some vertex v, to some vertex v; then we say that V; is 
reachable from v,-. An example of a graph is shown in Figure 12.1. 

Graphs have an enormous number of applications, due to their ability to model so 
many phenomena. There are many obvious examples. Computer networks can be mod- 
elled as graphs, with vertices corresponding to computers and edges corresponding to (di- 
rected) communication links between those computers. Street networks can be modelled 
as graphs, with vertices representing intersections and edges representing streets joining 
consecutive intersections. 

Less obvious examples occur as soon as we realize that graphs can model any pair- 
wise relationships within a set. For example, in a university setting we might have a 
timetable conflict graph whose vertices represent courses offered in the university and 
in which the edge (i, j) is present if and only if there is at least one student that is tak- 
ing both class i and class j. Thus, an edge indicates that the exam for class i can not be 
scheduled at the same time as the exam for class j. 

Throughout this section, we will use n to denote the number of vertices of G and 
m to denote the number of edges of G. That is, n = \V\ and m = \E\. Furthermore, we will 
assume that V = {0,,..,n - 1}. Any other data that we would like to associate with the 
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Figure 12.1: A graph with 12 vertices. Vertices are drawn as numbered circles and edges 
are drawn as pointed curves pointing from source to target. 

elements of V can be stored in an array of length n. 

Some typical operations performed on graphs are: 

• addEdge(i, j): Add the edge (i, j) to E. 

• removeEdge(i, j): Remove the edge (i, j) from E. 

• hasEdge(i, j): Check if the edge (i, j) eE 

• outEdges(i): Return a List of all integers j such that (i, j) 6 E 

• inEdges(i): Return a List of all integers j such that (j,i) 6 £ 

Note that these operations are not terribly difficult to implement efficiently For ex- 
ample, the first three operations can be implemented directly by using a USet, so they can 
be implemented in constant expected time using the hash tables discussed in Chapter 5. 
The last two operations can be implemented in constant time by storing, for each vertex, a 
list of its adjacent vertices. 

However, different applications of graphs have different performance requirements 
for these operations and, ideally, we can use the simplest implementation that satisfies all 
the application's requirements. For this reason, we discuss two broad categories of graph 
representations. 
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12.1 AdjacencyMatrix: Representing a Graph by a Matrix 

An adjacency matrix is a way of representing an n vertex graph G = (V, E) by an nxn matrix, 

a, whose entries are boolean values. 
AdjacencyMatrix 



int n; 

boolean[ ] [ ] a; 

Ad jacencyMatrix(int nO) { 

n = nO; 

a = new boolean[n ] [n] ; 
} 



The matrix entry a[i][ j] is defined as 

(true if(i, j)e£ 
a[i][j] = 

[false otherwise 

The adjacency matrix for the graph in Figure 12.1 is shown in Figure 12.2. 

With this representation, the addEdge(i, j), removeEdge(i, j), and hasEdge(i, j) op- 
erations just involve setting or reading the matrix entry a[i][j]: 
AdjacencyMatrix 



void addEdge(int i, int j) { 
a[i] [ j] = true; 

} 

void removeEdge(int i, int j) { 
a[i] [ j ] = false; 

} 

boolean hasEdge(int i, int j) { 

return a[i] [ j ] ; 
} 



These operations clearly take constant time per operation. 

Where the adjacency matrix performs poorly is with the outEdges(i) and inEdges(i 

operations. To implement these, we must scan all n entries in the corresponding row or 

column of a and gather up all the indices, j, where a[i][ j], respectively a[j][i], is true. 

AdjacencyMatrix 



List<Integer> outEdges(int i) { 

List<Integer> edges = new ArrayList<Integer>( ) ; 

for (int j = 0; j < n; j++) 
if (a[i][j]) edges. add(j); 

return edges; 
} 
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Figure 12.2: A graph and its adjacency matrix. 
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List<Integer> inEdges(int i) { 

List<Integer> edges = new Arrayl_ist<Integer>( ) ; 

for (int j = 0; j < n; j++) 
if (a[j][i]) edges. add( j) ; 

return edges; 
} 



These operations clearly take O(n) time per operation. 

Another drawback of the adjacency matrix representation is that it is big. It stores 
an n x n boolean matrix, so it requires at least n 2 bits of memory. The implementation here 
uses a matrix of boolean values so it actually uses on the order of n 2 bytes of memory. 
A more careful implementation, that packs w boolean values into each word of memory 
could reduce this space usage to 0(n 2 /w) words of memory. 

Theorem 12.1. The AdjacencyMatrix data structure implements the Graph interface. An 
AdjacencyMatrix supports the operations 

• addEdge(i, j), removeEdge(i, j), and hasEdge(i, j) in constant time per operation; and 

• inEdges(i), and outEdges(i) in O(n) time per operation. 
The space used by an AdjacencyMatrix is 0(n 2 ). 

Despite the high memory usage and poor performance of the inEdges(i) and outEdges(i) 
operations, an AdjacencyMatrix can still be useful for some applications. In particular, 
when the graph G is dense, i.e., it has close to n 2 edges, then a memory usage of n 2 may be 
acceptable. 

The AdjacencyMatrix data structure is also commonly used because algebraic op- 
erations on the matrix a can be used to efficiently compute properties of the graph G. 
This is a topic for a course on algorithms, but we point out one such property here: If we 
treat the entries of a as integers (1 for true and for false) and multiply a by itself us- 
ing matrix multiplication then we get the matrix a 2 . Recall, from the definition of matrix 
multiplication, that 

n-l 

a 2 [i][j] = £>[i][k]-a[k][j] . 

k=0 

Interpreting this sum in terms of the graph G, this formula counts the number of vertices, 
k, such that G contains both edges (i, k) and (k, j). That is, it counts the number of paths 
from i to j (through intermediate vertices, k) that have length exactly 2. This observation 
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is the foundation of an algorithm that computes the shortest paths between all pairs of 
vertices in G using only O(logn) matrix multiplications. 

12.2 AdjacencyLists: A Graph as a Collection of Lists 

Adjacency list representations takes a more vertex-centric approach. There are many differ- 
ent possible implementations of adjacency lists. In this section, we present a simple one. 
At the end of the section, we discuss different possibilities. In an adjacency list represen- 
tation, the graph G = (V,E) is represented as an array adj, of lists. The list adj[i] contains 
a list of all the vertices adjacent to vertex i. That is, it contains every index j such that 

(i,j)e£. 
AdjacencyLists 



int n; 

List<Integer>[ ] adj; 

Ad jacencyLists(int nO) { 

n = nO; 

adj = (List<Integer>[ ] )new List[n]; 

for (int i = 0; i < n; i++) 

adj[i] = new ArrayStack<Integer>( Integer. class) ; 
} 



(An example is shown in Figure 12.3.) In this particular implementation, we represent 
each list in adj as an ArrayStack, because we would like constant time access by position. 
Other options are also possible. Specifically we could have implemented adj as a DLList. 

The addEdge(i, j) operation just appends the value j to the list adj [i]: 

AdjacencyLists 



void addEdge(int i, int j) { 

adj[i] .add( j); 
} 



This takes constant time. 

The removeEdge(i, j) operation searches through the list ad j[i] until it finds j and 

then removes it: 

AdjacencyLists 



void removeEdge(int i, int j) { 

Iterator<Integer> it = ad j [i ] . iterator^ ) ; 
while (it.hasNext( ) ) { 
if (it.next() == j) { 
it .remove( ) ; 
return; 
} 
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Figure 12.3: A graph and its adjacency lists 



This takes 0(deg(i)) time, where deg(i) (the degree of 1) counts the number of edges in E 
that have i as their source. 

The hasEdge(i, j) operation is similar; it searches through the list ad j[i] until it 
finds j (and returns true), or reaches the end of the list (and returns false): 

AdjacencyLists 



boolean hasEdge(int i, int j) { 

return ad j [i] . contains( j ) ; 
} 



This also takes 0(deg(i)) time. 

The outEdges(i) operation is very simple; It simply returns the list ad j[i] 

AdjacencyLists 



List<Integer> outEdges(int i) { 

return ad j [i] ; 
} 
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This clearly takes constant time. 

The inEdges(i) operation is much more work. It scans over every vertex j checking 

if the edge (i, j) exists and, if so, adding j to the output list: 

AdjacencyLists 



List<Integer> inEdges(int i) { 

List<Integer> edges = new ArrayStack<Integer>( Integer .class) ; 

for (int j = 0; j < n; j++) 

if (ad j [ j ] . contains(i) ) edges .add( j ) ; 

return edges; 
} 



This operation is very slow. It scans the adjacency list of every vertex, so it takes 0(n + m) 
time. 

The following theorem summarizes the performance of the above data structure: 

Theorem 12.2. The AdjacencyLists data structure implements the Graph interface. An Ad- 
jacencyLists supports the operations 

• addEdge(i, j) in constant time per operation; 

• removeEdge(i, j) and hasEdge(i, j) in 0(deg(i)) time per operation; 

• outEdges(i) in constant time per operation; and 

• inEdges(i) in 0(n + m) time per operation. 

The space used by a AdjacencyLists is 0(n + m). 

As alluded to earlier, there are many different choices to be made when implement- 
ing a graph as an adjacency list. Some questions that come up include: 

• What type of collection should be used to store each element of ad j? One could use 
an array-based list, a linked-list, or even a hashtable. 

• Should there be a second adjacency list, inadj, that stores, for each i, the list of 
vertices, j, such that (j,i) 6 £? This can greatly reduce the running-time of the 
inEdges(i) operation, but requires slightly more work when adding or removing 
edges. 

• Should the entry for the edge (i, j) in ad j[i] be linked by a reference to the corre- 
sponding entry in inad j [ j ]? 
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• Should edges be first-class objects with their own associated data? In this way, adj 
would contain lists of edges rather than lists of vertices (integers). 

Most of these questions come down to a tradeoff between complexity (and space) of im- 
plementation and performance features of the implementation. 

12.3 Graph Traversal 

In this section we present two algorithms for exploring a graph, starting at one of its ver- 
tices, i, and finding all vertices that are reachable from i. Both of these algorithms are 
best suited to graphs represented using an adjacency list representation. Therefore, when 
analyzing these algorithms we will assume that the underlying representation is as an Ad- 
jacencyLists. 

12.3.1 Breadth-First Search 

The bread-first-search algorithm starts at a vertex i and visits, first the neighbours of i, 
then the neighbours of the neighbours of i, then the neighbours of the neighbours of the 
neighbours of i, and so on. 

This algorithm is a generalization of the breadth-first-search algorithm for binary 
trees (Section 6.1.2), and is very similar; it uses a queue, q, that initially contains only i. 
It then repeatedly extracts an element from q and adds its neighbours to q, provided that 
these neighbours have never been in q before. The only major difference between breadth- 
first-search for graphs and for trees is that the algorithm for graphs has to ensure that 
it does not add the same vertex to q more than once. It does this by using an auxiliary 
boolean array, seen, that keeps track of which vertices have already been discovered. 

Algorithms 



void bfs(Graph g, int r) { 

boolean[] seen = new boolean [g. nVertices( )] ; 
Queue<Integer> q = new SI_l_ist<Integer>( ) ; 
q.add(r); 
seen[r] = true; 
while ( !q. isEmpty ( ) ) { 
int i = q.remove( ) ; 
for (Integer j : g. outEdges(i) ) { 
if (!seen[j]) { 
q.add( j); 
seen[j] = true; 
} 
} 
} 
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Figure 12.4: An example of bread-first-search starting at node 0. Nodes are labelled with 
the order in which they are added to q. Edges that result in nodes being added to q are 
drawn in black, other edges are drawn in grey. 

| } | 

An example of running bfs(g, 0) on the graph from Figure 12.1 is shown in Figure 12.4. 
Different executions are possible, depending on the ordering of the adjacency lists; Fig- 
ure 12.4 uses the adjacency lists in Figure 12.3. 

Analyzing the running-time of the bfs(g, i) routine is fairly straightforward. The 
use of the seen array ensures that no vertex is added to q more than once. Adding (and 
later removing) each vertex from q takes constant time per vertex for a total of O(n) time. 
Since each vertex is processed at most once by the inner loop, each adjacency list is pro- 
cessed at most once, so each edge of G is processed at most once. This processing, which is 
done in the inner loop takes constant time per iteration, for a total of O(m) time. Therefore, 
the entire algorithm runs in 0(n + m) time. 

The following theorem summarizes the performance of the bf s(g, r) algorithm. 

Theorem 12.3. When given as input a Graph, g, that is implemented using the Adjacen- 
cyLists data structure, the bfs(g,r) algorithm runs in 0(n -i-m) time. 

A breadth-first traversal has some very special properties. Calling bfs(g, r) will 
eventually enqueue (and eventually dequeue) every vertex j such that there is a directed 
path from r to j . Moreover, the vertices at distance from r (r itself) will enter q before the 
vertices at distance 1, which will enter q before the vertices at distance 2, and so on. Thus, 
the bfs(g, r) method visits vertices in increasing order of distance from r and vertices that 
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can not be reached from r are never output at all. 

A particularly useful application of the breadth-first-search algorithm is, therefore, 
in computing shortest paths. To compute the shortest path from r to every other vertex, 
we use a variant of bf s(g, r) that uses an auxilliary array p, of length n. When a new vertex 
j is added to q, we set p[j] = i. In this way p[ j] becomes the second last node on a shortest 
path from r to j. Repeating this, by taking p[p[j], p[p[p[j]]L an d so on we can reconstruct 
the (reversal of) a shortest path from r to j . 

12.3.2 Depth-First Search 

The depth-first-search algorithm is similar to the standard algorithm for traversing binary 
trees; it first fully explores one subtree before returning to the current node and then 
exploring the other subtree. Another way to think of depth-first-search is by saying that it 
is similar to breadth-first search except that it uses a stack instead of a queue. 

During the execution of the depth-first-search algorithm, each vertex, i, is assigned 
a color, c[i]: white if we have never seen the vertex before, grey if we are currently visiting 
that vertex, and black if we are done visiting that vertex. The easiest way to think of depth- 
first-search is as a recursive algorithm. It starts by visiting r. When visiting a vertex i, we 
first mark i as grey. Next, we scan i's adjacency list and recursively visit any white vertex 
we find in this list. Finally, we are done processing i, so we color i black and return. 

Algorithms 



void dfs(Graph g, int r) { 

byte[] c = new byte[g. nVertices( ) ] ; 
dfs(g, r, c); 

} 

void dfs(Graph g, int i, byte[] c) { 
c[i] = grey; // currently visiting i 
for (Integer j : g.outEdges( i) ) { 
if ( c [j] == white) { 
c[ j] = grey; 
dfs(g, j, c); 
} 
} 
c[i] = black; // done visiting i 

} 



An example of the execution of this algorithm is shown in Figure 12.5 

Although depth-first-search may best be thought of as a recursive algorithm, recur- 
sion is not the best way to implement it. Indeed, the code given above will fail for many 
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Figure 12.5: An example of depth-first-search starting at node 0. Nodes are labelled with 
the order in which they are processed. Edges that result in a recursive call are drawn in 
black, other edges are drawn in grey. 



large graphs by causing a stack overflow. An alternative implementation is to replace the 

recursion stack with an explicit stack, s. The following implementation does just that: 

Algorithms 



void dfs2(Graph g, int r) { 

byte[] c = new byte[g. nVertices( ) ] ; 
Stack<Integer> s = new Stack<Integer>( ) ; 
s . push(r) ; 

while ( !s . isEmpty ( ) ) { 
int i = s . pop( ) ; 
if ( c [i] == white) { 
c[i] = grey; 

g.outEdges(i) ) 



lor (int j 
s . push( j ) 



} 



} 



In the above code, when next vertex, i, is processed, i is colored grey and then replaced, 
on the stack, with its adjacent vertices. During the next iteration, one of these vertices will 
be visited. 

Not surprisingly, the running times of dfs(g, r) and dfs2(g, r) are the same as that 
of bfs(g,r): 

Theorem 12.4. When given as input a Graph, g, that is implemented using the Adjacen- 
cyLists data structure, the dfs(g,r) and dfs2(g,r) algorithms each run in 0(n + m) time. 
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Figure 12.6: The depth-first-search algorithm can be used to detect cycles in G.The node 
j is colored grey while i is still grey. This implies there is a path, P, from i to j in the 
depth-first-search tree, and the edge ( j, i) implies that P is also a cycle. 

As with the breadth-first-search algorithm, there is an underlying tree associated 
with each execution of depth-first-search. When a node i * r goes from white to grey, this 
is because df s(g, i, c) was called recursively while processing some node i'. (In the case of 
dfs2(g, r) algorithm, i is one of the nodes that replaced i' on the stack.) If we think of i' 
as the parent of i, then we obtain a tree rooted at r. In Figure 12.5, this tree is a path from 
vertex to vertex 11. 

An important property of the depth-first-search algorithm is the following: Sup- 
pose that when node i is colored grey, there exists a path from i to some other node j that 
uses only white vertices. Then j will be colored (first grey then) black before i is colored 
black. (This can be proven by contradiction, by considering any path P from i to j.) 

One application of this property is the detection of cycles. Refer to Figure 12.6. 
Consider some cycle, C, that can be reached from r. Let i be the first node of C that is 
colored grey, and let j be the node that precedes i on the cycle C. Then, by the above 
property, j will be colored grey and the edge (j,i) will be considered by the algorithm 
while i is still grey. Thus, the algorithm can conclude that there is a path, P, from i to j 
in the depth-first-search tree and the edge ( j, i) exists. Therefore, P is also a cycle. 

12.4 Discussion and Exercises 

The running times of the depth-first-search and breadth-first-search algorithms are some- 
what overstated by the Theorems 12.3 and 12.4. Define n r as the number of vertices, i, of 
G, for which there exists a path from r to i. Define m r as the number of edges that have 
these vertices as their sources. Then the following theorem is a more precise statement 
of the running times of the breadth-first-search and depth-first-search algorithms. (This 
more refined statement of the running time is useful in some of the applications of these 
algorithms outlined in the exercises.) 
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Theorem 12.5. When given as input a Graph, g, that is implemented using the Adjacen- 
cyLists data structure, the bf s(g,r), df s(g,r) and df s2(g,r) algorithms each run in 0(n r +m r ) 
time. 

Breadth-first search seems to have been discovered independently by Moore [50] 
and Lee [47] in the contexts of maze exploration and circuit routing, respectively. 

Adjacency-list representations of graphs were first popularized by Hopcroft and 
Tarjan [38] as an alternative to the (then more common) adjacency-matrix representa- 
tion. This representation, and depth-first-search, played a major part in the celebrated 
Hopcroft-Tarjan planarity testing algorithm that can determine, in O(n) time, if a graph 
can be drawn, in the plane, and in such a way that no pair of edges cross each other [39]. 

In the following exercises, an undirected graph is one in which, for every i and j, 
the edge (i, j) is present if and only if the edge ( j,i) is present. 

Exercise 12.1. Let G be an undirected graph. We say G is connected if, for every pair of 
vertices i and j in G, there is a path from i to j (since G is undirected, there is also a path 
from j to i). Show how to test if G is connected in 0(n + m) time. 

Exercise 12.2. Let G be an undirected graph. A connected-component labelling of G parti- 
tions to the vertices of G into maximal sets, each of which forms a connected subgraph. 
Show how to compute a connected component labelling of G in 0(n + m) time. 

Exercise 12.3. Let G be an undirected graph. A spanning forest of G is a collection of trees, 
one per component, whose edges are edges of G and whose vertices contain all vertices of 
G. Show how to compute a spanning forest of of G in 0(n + m) time. 

Exercise 12 A. We say that a graph G is stronly-connected if, for every pair of vertices i and 
j in G, there is a path from i to j. Show how to test if G is strongly-connected in 0(n + m) 
time. 

Exercise 12.5. Given a graph G = (V, E) and some special vertex r e V, show how to com- 
pute the length of the shortest path from r to i for every vertex i eV. 

Exercise 12.6. Give a (simple) example where the df s(g, r) code visits the nodes of a graph 
in an order that is different from that of the dfs2(g, r) code. Write a version of dfs2(g, r) 
that always visits nodes in exactly the same order as df s(g, r). (Hint: Just start tracing the 
execution of each algorithm on some graph where r is the source of more than 1 edge.) 
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Data Structures for Integers 



In this chapter, we return to the problem of implementing an SSet. The difference now 
is that we assume the elements stored in the SSet are w-bit integers. That is, we want to 
implement add(x), remove(x), and find(x) where x 6 {0, ...,2 W - 1}. It is not too hard to 
think of plenty of applications where the data — or at least the key that we use for sorting 
the data — is an integer. 

We will discuss three data structures, each building on the ideas of the previous. 
The first structure, the BinaryTrie performs all three SSet operations in O(w) time. This 
is not very impressive, since any subset of {0, . . . , 2 W - 1 } has size n < 2 W , so that log n < w. All 
the other SSet implementations discussed in this book perform all operations in O(logn) 
time so they are all at least as fast as a BinaryTrie. 

The second structure, the XFastTrie, speeds up the search in a BinaryTrie by 
using hashing. With this speedup, the f ind(x) operation runs in O(logw) time. However, 
add(x) and remove(x) operations in an XFastTrie still take O(w) time and the space used 
by an XFastTrie is 0(n ■ w). 

The third data structure, the YFastTrie, uses an XFastTrie to store only a sample 
of roughly one out of every w elements and stores the remaining elements a standard SSet 
structure. This trick reduces the running time of add(x) and remove(x) to O(logw) and 
decreases the space to O(n). 

The implementations used as examples in this chapter can store any type of data, as 
long an integer can be associated with it. In the code samples, the variable ix is always the 
integer value associated with x, and the method in.intValue(x) converts x to its associated 
integer. In the text, however, we will simply treat x as if it is an integer. 
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13.1. BinaryTrie: A digital search tree 
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Figure 13.1: The integers stored in a binary trie are encoded as root-to-leaf paths. 

13.1 BinaryTrie: A digital search tree 

A BinaryTrie encode a set of w bit integers in a binary tree. All leaves in the tree have 
depth w and each integer is encoded as a root-to-leaf path. The path for the integer x turns 
left at level i if the ith most significant bit of x is a and turns right if it is a 1. Figure 13.1 
shows an example for the case w = 4, in which the trie stores the integers 3(0011), 9(1001), 
12(1100), and 13(1101). 

Because the search path for a value x depends on the bits of x it will be helpful 
to name the children of a node, u, u.child[0] (left) and u.child[1] (right). These child 
pointers will actually serve double-duty Since the leaves in a binary trie have no children, 
the pointers are used to string the leaves together into a doubly-linked list. For a leaf in 
the binary trie u.child[0] (prev) is the node that comes before u in the list and u.child[1] 
(next) is the node that follows u in the list. A special node, dummy, is used both before the 
first node and after the last node in the list (see Section 3.2). 

Each node, u, also contains an additional pointer u. jump. If u's left child is missing, 
then u. jump points to the smallest leaf in u's subtree. If u's right child is missing, then 
u. j ump points to the largest leaf in u's subtree. An example of a BinaryTrie, showing j ump 
pointers and the doubly-linked list at the leaves, is shown in Figure 13.2 

The f ind(x) operation in a BinaryTrie is fairly straightforward. We try to follow 
the search path for x in the trie. If we reach a leaf, then we have found x. If, we reach a node 
u where we cannot proceed (because u is missing a child) then we follow u.jump, which 
takes us either to smallest leaf larger than x or the largest leaf smaller than x. Which of 
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Figure 13.2: A BinaryTrie with jump pointers shown as curved dashed edges. 



these two cases occurs depends on whether u is missing its left or right child, respectively. 

In the former case (u is missing its left child), we have found the value we are looking for. 

In the latter case (u is missing its right child), we can use the linked list to reach the value 

we are looking for. Each of these cases is illustrated in Figure 13.3. 
BinaryTrie 



T find(T x) { 

int i, c = 0, ix = it . intValue(x) ; 

Node u = r; 

for (i = 0; i < w; i++) { 

c = (ix >>> w-i-1 ) & 1 ; 

if (u.child[c] == null) break; 

u = u.child[c] ; 

} 

if (i == w) return u.x; // found it 
u = (c == 0) ? u.jump : u . jump.child[next ] ; 
return u == dummy ? null : u.x; 
} 



The running-time of the f ind(x) method is dominated by the time it takes to follow a 
root-to-leaf path, so it runs in O(w) time. 

The add(x) operation in a BinaryTrie is fairly straightforward, but it has a lot of 
things to take care of: 

1. It follows the search path for x until reaching a node u where it can no longer pro- 
ceed. 

2. It creates the remainder of the search path from u to a leaf that contains x. 
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Figure 13.3: The paths followed by f ind(5) and f ind(8). 
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Figure 13.4: Adding the values 2 and 15 to the BinaryTrie in Figure 13.2. 

3. It adds the node, u', containing x to the linked list of leaves (it has access to u"s prede- 
cessor, pred, in the linked list from the jump pointer of the last node, u, encountered 
during step 1.) 

4. It walks back up the search path for x adjusting jump pointers at the nodes whose 
jump pointer should now point to x. 



An addition is illustrated in Figure 13.4. 
BinaryTrie 



boolean add(T x) { 

int i, c = 0, ix = it . intValue(x) ; 

Node u = r; 

// 1 - search for ix until falling out of the trie 
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for (i = 0; i < w; i++) { 
c = (ix >>> w-i-1 ) & 1 ; 
if (u.child[c] == null) break; 
u = u.child[c] ; 

} 

if (1 == w) return false; // trie already contains x - abort 

Node pred = (c == right) ? u.jump : u . jump.child[0] ; // save for step 

u.jump = null; // u will have two children shortly 

112- add path to ix 
for ( ; i < w; i++) { 

c = (ix >>> w-i-1 ) & 1 ; 
u.child[c] = newNode(); 
u . child[c] . parent = u; 
u = u.child[c] ; 

} 

u . x = x; 

113- add u to linked list 
u.child[prev] = pred; 
u.child[next ] = pred. child[next ] ; 
u.child[prev] .child[next ] = u; 

u. child[next ] . child[prev] = u; 

II A - walk back up, updating jump pointers 

Node v = u. parent; 

while (v != null) { 

if ( (v.child[left] == null 

&& (v. jump == null |j it . intValue( v. jump. x) > ix)) 
|| ( v. child[right ] == null 

&& (v. jump == null || it . intValue( v. jump. x) < ix))) 
v. jump = u; 
v = v. parent; 

} 

n++; 

return true; 



This method performs one walk down the search path for x and one walk back up. Each 
step of these walks takes constant time, so the add(x) runs in O(w) time. 

The remove(x) operation undoes the work of add(x). Like add(x), it has a lot of 
things to take care of: 

1. It follows the search path for x until reaching the leaf, u, containing x. 

2. It removes u from the doubly-linked list 
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Figure 13.5: Removing the value 9 from the BinaryTrie in Figure 13.2. 

3. It deletes u and then walks back up the search path for x deleting nodes until reach- 
ing a node v that has a child that is not on the search path for x 

4. It walks upwards from v to the root updating any jump pointers that point to u. 



A removal is illustrated in Figure 13.5. 

BinaryTrie 

boolean remove(T x) { 

// 1 - find leaf, u, containing x 
int i = 0, c, ix = it . intValue(x) ; 
Node u = r; 

for (i = 0; i < w; i++) { 
c = (ix >>> w-i-1 ) & 1 ; 
if (u.child[c] == null) return false; 
u = u.child[c] ; 

} 

II 2 - remove u from linked list 

u . child[prev] . child[next ] = u.child[next ] ; 

u . child[next ] . child[prev] = u . child[prev] ; 

Node v = u; 

113- delete nodes on path to u 

for (i = w-1; i >= 0; i--) { 

c = (ix >>> w-i-1 ) & 1 ; 

v = v. parent; 

v.child[c] = null; 

if ( v. child[ 1-c ] != null) break; 

} 

// 4 - update jump pointers 

v. jump = u; 
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for (; i >= 0; i— ) { 
c = (ix >>> w-i-1 ) & 1 ; 
if (v. jump == u) 

v.jump = u.child[ 1-c] ; 
v = v. parent; 

} 

n--; 

return true; 
} 



Theorem 13.1. A BinaryTrie implements the SSet interface for w -bit integers. A BinaryTrie 
supports the operations add(x), remove(x), and f ind(x) in 0(w) time per operation. The space 
used by a BinaryTrie that stores n values is 0(n • w). 

13.2 XFastTrie: Searching in Doubly-Logarithmic Time 

The performance of the BinaryTrie structure is not that impressive. The number of el- 
ements, n, stored in the structure is at most 2 W , so logn < w. In other words, any of the 
comparison-based SSet structures described in other parts of this book are at least as effi- 
cient as a BinaryTrie, and don't have the restriction of only being able to store integers. 

Next we describe the XFastTrie, which is just a BinaryTrie with w + 1 hash tables — 
one for each level of the trie. These hash tables are used to speed up the f ind(x) operation 
to O(logw) time. Recall that the find(x) operation in a BinaryTrie is almost complete 
once we reach a node, u, where the search path for x would like to proceed to u. right (or 
u.left) but u has no right (respectively, left) child. At this point, the search uses u. jump to 
jump to a leaf, v, of the BinaryTrie and either return v or its successor in the linked list 
of leaves. An XFastTrie speeds up the search process by using binary search on the levels 
of the trie to locate the node u. 

To use binary search, we need a way to determine if the node u we are looking for 
is above a particular level, i, of if u is at or below level i. This information is given by 
the highest-order i bits in the binary representation of x; these bits determine the search 
path that x takes from the root to level i. For an example, refer to Figure 13.6; in this 
figure the last node, u, on search path for 14 (whose binary representation is 1110) is the 
node labelled 11** at level 2 because there is no node labelled 111* at level 3. Thus, we 
can label each node at level i with an i-bit integer. Then the node u we are searching 
for is at or below level i if and only if there is a node at level i whose label matches the 
highest-order i bits of x. 

In an XFastTrie, we store, for each i 6 {0,...,w}, all the nodes at level i in a USet, 
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^ 3 *■ * 9 * 12 13 > 4 



Figure 13.6: The search path for 14 (1110) ends at the node labelled 11** since there is no 
node labelled 111*. 



t[i], that is implemented as a hash table (Chapter 5). Using this USet allows us to check in 
constant expected time if there is a node at level i whose label matches the highest-order 
i bits of x. In fact, we can even find this node using t[i].f ind(x>>(w - i)). 

The hash tables t[0],...,t[w] allow us to use binary search to find u. Initially, we 
know that u is at some level 1 with < i < w + 1. We therefore initialize 1 = and h = w + 1 
and repeatedly look at the hash table t[i], where 1 = [(1 + h)/2J. If t[i] contains a node 
whose label matches x's highest-order i bits then we set 1 = i (u is at or below level i), 
otherwise we set h = i (u is above level i). This process terminates when h-1 < 1, in 
which case we determine that u is at level 1. We then complete the f ind(x) operation using 
u. jump and the doubly-linked list of leaves. 
XFastTrie 



T find(T x) { 

int 1 = 0, h = w+1, ix = it . intValue(x) ; 
Node v, u = r, q = newl\lode(); 
while (h-1 > 1) { 
int i = (l+h)/2; 
q. prefix = ix >>> w-i; 
if ((v = t[i].find(q)) == null) { 

h = i; 
} else { 
u = v; 
1 = 1; 
} 
} 
if (1 == w) return u.x; 
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Node pred = (((ix >>> w-1-1) & 1) == 1) ? u.jump : u . jump. child[0] ; 
return (pred.child[next ] == dummy) ? null : pred. child[next ] . x; 
} 



Each iteration of the while loop in the above method decreases h - 1 by roughly a factor of 
2, so this loop finds u after O(logw) iterations. Each iteration performs a constant amount 
of work and one f ind(x) operation in a USet, which takes constant expected time. The 
remaining work takes only constant time, so the f ind(x) method in an XFastTrie takes 
only O(logw) expected time. 

The add(x) and remove(x) methods for an XFastTrie are almost identical to the 
same methods in a BinaryTrie. The only modifications are for managing the hash tables 
t[0],. . . ,t[w]. During the add(x) operation, when a new node is created at level i, this node 
is added to t[i]. During a remove(x) operation, when a node is removed form level i, this 
node is removed from t[i]. Since adding and removing from a hash table take constant 
expected time, this does not increase the running times of add(x) and remove(x) by more 
than a constant factor. We omit a code listing or add(x) and remove(x) since it is almost 
identical to the (long) code listing already provided for the same methods in a BinaryTrie. 

The following theorem summarizes the performance of an XFastTrie: 

Theorem 13.2. An XFastTrie implements the SSet interface for w-bit integers. An XFastTrie 

supports the operations 

• add(x) and remove(x) in O(w) expected time per operation and 

• f ind(x) in O(logw) expected time per operation. 

The space used by an XFastTrie that stores n values is 0(n • w). 

13.3 YFastTrie: A Doubly-Logarithmic Time SSet 

The XFastTrie is a big improvement over the BinaryTrie in terms of query time — some 
would even call it an exponential improvement — but the add(x) and remove(x) operations 
are still not terribly fast. Furthermore, the space usage, 0(n • w), is higher than the other 
SSet implementation in this book, which all use O(n) space. These two problems are re- 
lated; if n add(x) operations build a structure of size n -w then the add(x) operation requires 
on the order of w time (and space) per operation. 

The YFastTrie data structure simultaneously addresses both the space and speed 
issues of XFastTries. A YFastTrie uses an XFastTrie, xf t, but only stores 0(n/w) values 
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^ 3 *■ *• 9 * "*15 




Figure 13.7: A YFastTrie containing the values 0, 1, 3, 4, 6, 8, 9, 10, 11, and 13. 



in xf t. In this way, the total space used by xf t is only O(n). Furthermore, only one out of 
every w add(x) or remove(x) operations in the YFastTrie results in an add(x) or remove(x) 
operation in xft. By doing this, the average cost incurred by calls to xft's add(x) and 
remove(x) operations is only constant. 

The obvious question becomes: If xft only stores n/w elements, where do the re- 
maining n(l - 1/w) elements go? These elements go into secondary structures, in this case an 
extended version of treaps (Section 7.2). There are roughly n/w of these secondary struc- 
tures so, on average, each of them stores O(w) items. Treaps support logarithmic time SSet 
operations, so the operations on these treaps will run in O(logw) time, as required. 

More concretely, a YFastTrie contains an XFastTrie, xft, that contains a random 
sample of the data, where each element appears in the sample independently with proba- 
bility 1/w. For convenience, the value 2 W - 1, is always contained in xft. Let x < xj < ■ • • < 
xjt_i denote the elements stored in xft. Associated with each element, x,, is a treap, t,, 
that stores all values in the range x,_! + l,...,x,. This is illustrated in Figure 13.7. 

The f ind(x) operation in a YFastTrie is fairly easy. We search for x in xft and find 
some value x,- associated with the treap t,. When then use the treap f ind(x) method on t,- 
to answer the query. The entire method is a one-liner: 
YFastTrie 



T find(T x) { 

return xft.find(new Pair<T>( it . intValue(x) ) ) . t .f ind(x) ; 
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} 



The first f ind(x) operation (on xf t) takes O(logw) time. The second f ind(x) operation (on 
a treap) takes O(logr) time, where r is the size of the treap. Later in this section, we will 
show that the expected size of the treap is O(w) so that this operation takes O(logw) time. 1 
Adding an element to a YFastTrie is also fairly simple — most of the time. The 
add(x) method calls xf t.f ind(x) to locate the treap, t, into which x should be inserted. It 
then calls t.add(x) to add x to t. At this point, it tosses a biased coin, that comes up heads 
with probability 1/w. If this coin comes up heads, x will be added to xf t. 

This is where things get a little more complicated. When x is added to xf t, the 
treap t needs to be split into two treaps 1 1 and t'. The treap 1 1 contains all the values less 
than or equal to x; t' is the original treap, t, with the elements of t1 removed. Once this 
is done we add the pair (x, t1) to xf t. Figure 13.8 shows an example. 
YFastTrie 



boolean add(T x) { 

int ix = it . intValue(x) ; 

STreap<T> t = xft.find(new Pair<T>(ix) ) . t ; 
if (t.add(x)) { 
n++; 

if (rand.nextlnt (w) == 0) { 
STreap<T> t1 = t.split(x); 
xft.add(new Pair<T>(ix, t1)); 

} 

return true; 

} 

return false; 

} 



Adding x to t takes O(logw) time. Exercise 7.9 shows that splitting t into t1 and t' can 

also be done in O(logw) expected time. Adding the pair (x,t1) to xft takes O(w) time, but 

only happens with probability 1/w. Therefore, the expected running time of the add(x) 

operation is 

O(logw) + -O(w) = O(logw) 
w 

The remove(x) method just undoes the work performed by add(x). We use xft to 
find the leaf, u, in xft that contains the answer to xf t.f ind(x). From u, we get the treap, t, 
containing x and remove x from t. If x was also stored in xft (and x is not equal to 2 W - 1) 



This is an application of Jensen's Inequality: If E[r] = w, then Eflogr] < logw. 
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Figure 13.8: Adding the values 2 and 6 to a YFastTrie. The coin toss for 6 came up heads, 
so 6 was added to xf t and the treap containing 4, 5, 6, 8, 9 was split. 

then we remove x from xf t and add the elements from x's treap to the treap, t2, that is 

stored by u's successor in the linked list. This is illustrated in Figure 13.9. 
YFastTrie 



boolean remove(T x) { 
Int ix = it . intValue(x) ; 
Node<T> u = xft .f indNode(ix) ; 
boolean ret = u . x . t .remove(x ) ; 
if (ret) n--; 
if (u.x.x == ix && ix != Oxffffffff) { 

STreap<T> t2 = u. child[ 1 ] . x . t ; 

t2.absorb(u. x. t ) ; 

xft .remove(u . x) ; 

} 

return ret; 

} 



Finding the node u in xft takes O(logw) expected time. Removing x from t takes O(logw) 
expected time. Again, Exercise 7.9 shows that merging all the elements of t into t2 can 
be done in O(logw) time. If necessary, removing x from xft takes O(w) time, but x is only 
contained in xft with probability 1/w. Therefore, the expected time to remove an element 
from a YFastTrie is O(logw). 

Earlier in the discussion, we put off arguing about the sizes of treaps in this struc- 
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Figure 13.9: Removing the values 1 and 9 from a YFastTrie in Figure 13.8. 

ture until later. Before finishing we prove the result we need. 

Lemma 13.1. Let x be an integer stored in a YFastTrie and let n x denote the number of ele- 
ments in the treap, t, that contains x. Then E[n x ] < 2w- 1. 



Proof. Refer to Figure 13.10. Let Xi < x% < ••• < Xj = x < ••• < x n denote the elements stored 
in the YFastTrie. The treap t contains some elements greater than or equal to x. These 
are x,-, x, +1 ,...,x,- + y_ 1 , where Xj + y_j is the only one of these elements in which the biased 
coin toss performed in the add(x) method came up heads. In other words, E[j] is equal to 
the expected number of biased coin tosses required to obtain the first heads. 2 Each coin 
toss is independent and comes up heads with probability 1/w, so E[j] < w. (See Lemma 4.2 
for an analysis of this for the case w = 2.) 

Similarly the elements of t smaller than x are Xj_i,...,Xj_j; where all these k coin 
tosses come up tails and the coin toss for x,_fc_i comes up heads. Therefore, E[fc] < w-1, 
since this is the same coin tossing experiment considered previously but in which the last 
toss doesn't count. In summary n x = j + k, so 



E[n x ] = E[;' + Jk] = E[;'] + E[Jt]<2w-l . 



□ 



This analysis ignores the fact that j never exceeds n — i + 1. However, this only decreases E[/'], so the upper 
bound still holds 
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elements in treap, t, containing x 
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Figure 13.10: The number of elements in the treap, t, containing x is determined by two 
coin tossing experiments. 

Lemma 13.1 was the last piece in the proof of the following theorem, which sum- 
marizes the performance of the YFastTrie: 

Theorem 13.3. A YFastTrie implements the SSet interface for w-bit integers. A YFastTrie 
supports the operations add(x), remove(x), and find(x) in O(logw) expected time per operation. 
The space used by a YFastTrie that stores n values is 0(n + w). 

The w term in the space requirement comes from the fact that xf t always stores 
the value 2 W - 1. The implementation could be modified (at the expense of adding some 
extra cases to the code) so that it is unnecessary to store this value. In this case, the space 
requirement in the theorem becomes O(n). 

13.4 Discussion and Exercises 

The first data structure to provide O(logw) time add(x), remove(x), and f ind(x) operations 
was proposed by van Emde Boas and has since become known as the van Emde Boas (or 
stratified) tree. The original van Emde Boas structure had size 2 W , so was impractical for 
large integers. 

The XFastTrie and YFastTrie data structures were discovered by Willard [72]. 
The XFastTrie structure is very closely related to van Emde Boas trees. One view of this 
is that the hash tables in an XFastTrie replace arrays in a van Emde Boas tree. That is, 
instead of storing the hash table t[i], a van Emde Boas tree stores an array of length 2 1 . 

Another structure for storing integers is Fredman and Willard's fusion trees [28]. 
This structure can store n w-bit integers in O(n) space so that the f ind(x) operation runs in 
0((logn)/(logw)) time. By using a fusion tree when logw > ^Jlogr\ and a YFastTrie when 
logw < -y/logn one obtains an O(n) space data structure that can implement the f ind(x) 
operation in 0(^/logn) time. Recent lower-bound results of Patra§cu and Thorup [57] show 
that these results are optimal, at least for structures that use only O(n) space. 



234 



13. Data Structures for Integers 13.4. Discussion and Exercises 

Exercise 13.1. Design and implement a simplified version of a BinaryTrie that doesn't 
have a linked list or jump pointers, but can still do find(x) in O(w) time. 

Exercise 13.2. Design and implement a simplified implementation of an XFastTrie that 
doesn't use a binary trie at all. Instead, your implementation should store everything is 
stored in a doubly-linked list and w hash tables. 

Exercise 13.3. For an element x, let d(x) be the difference between x and the value returned 
by f ind(x) [if f ind(x) returns null, then define d(x). Design and implement a modified 
version of the f ind(x) operation in an XFastTrie that runs in 0(logd(x)) expected time. 
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Chapter 14 



External Memory Searching 



Throughout this book, we have been using the w-bit word-RAM model of computation de- 
fined in Section 1.3. An implicit assumption of this model is that our computer has a large 
enough random access memory to store all the data in the data structure. In some situa- 
tions, this assumption is not valid. There exist collections of data so big that there is no 
computer with a memory large enough to store all the data. In such cases, the application 
must resort to storing the data on some external storage medium such as a hard disk, a 
solid state disk, or even a network file server (which has its own external storage). 

Accessing an item from the external storage is extremely slow. The hard disk at- 
tached to the computer on which this book is being written has an average access time 
of 19ms and the solid state drive attached to this computer has an average access time of 
0.3ms. In contrast, the random access memory in this computer has an average access time 
of less than 0.000113ms. It is more than 2 500 times faster to access RAM than the solid 
state drive and more than 160 000 times faster to access RAM than the hard drive. 

These speeds are fairly typical; accessing a random byte from RAM is thousands 
of times faster than accessing a random byte from a hard disk or solid-state drive. Access 
time, however, does not tell the whole story. When we access a byte from a hard disk 
or solid state disk, an entire block of the disk is read. Each of the drives attached to this 
computer has a block size of 4 096; each time we read one byte, the drive gives us a block 
containing 4 096 bytes. If we organize our data structure carefully, this means that each 
disk access could yield 4 096 bytes that are helpful in completing whatever operation we 
are doing. 

This is the idea behind the external memory model of computation, illustrated schemat- 
ically in Figure 14.1. In this model, the computer has access to a large external memory 
where all the data resides. This memory is divided into memory blocks each containing B 
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CPU 



RAM 



X 



External Memory 




Figure 14.1: In the external memory model, accessing an individual item, x, in the external 
memory requires reading the entire block containing x into RAM. 

words. The computer also has a limited internal memory on which it can perform compu- 
tations. Transferring a block between internal memory and external memory takes con- 
stant time. Computations performed within the internal memory are free; they take no 
time at all. The fact that internal memory computations are free may seem a bit strange, 
but this is really just to emphasize the fact that external memory is so much slower than 
RAM. 

In the full-blown external memory model, the size of the internal memory is also 
a parameter. However, for the data structures described in this chapter, it is sufficient to 
have an internal memory that is capable of storing a constant number of blocks. In other 
words, to implement these data structures it is sufficient to have an internal memory of 
size 0(B). 

14.1 The Block Store 

The notion of external memory includes a large number of possible different devices, each 
of which has their own block size and is accessed with their own collection of system calls. 
To simplify the exposition of this chapter so that we can focus on the common ideas, we 
encapsulate external memory devices with an object called a BlockStore. A BlockStore 
stores a collection of memory blocks, each of size B. Each block is uniquely identified by 
its integer index. A BlockStore supports these operations: 
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1. readBlock(i): Return the contents of the block whose index is i. 

2. writeBlock(i, b): Write contents of b to the block whose index is i. 

3. placeBlock(b): Return a new index and store the contents of b at this index. 

4. freeBlock(i): Free the block whose index is i. This indicates that the contents of 
this block are no longer used so the external memory allocated by this block may be 
reused. 

The easiest way to imagine a BlockStore is to think of it as storing a file on 
disk that is partitioned into blocks, each containing B bytes. Then readBlock(i) and 
writeBlock(i, b) simply read and write bytes iJ3,...,(i + l)B - 1 of this file. In addition, a 
simple BlockStore could keep a free list of blocks that are available for use. Blocks freed 
with freeBlock(i) are added to the free list. In this way placeBlock(b) can use a block 
from the free list or, if none is available, can append a new block to the end of the file. 

14.2 B-Trees 

In this section we discuss a generalization of binary trees, called B-trees, that is efficient in 
the external memory model. Alternatively, B-trees can be viewed as the natural general- 
ization of 2-4 trees described in Section 9.1. (A 2-4 tree is a special case of a B-tree that we 
get by setting B = 2.) 

For any integer B > 2, a B-tree is a tree in which all leaves have the same depth and 
every non-root internal node, u, has at least B children and at most 2B children. The chil- 
dren of u are stored in an array u. children. The requirement on the number of children 
is relaxed at the root, which can have anywhere between 2 and 2B children. 

If the height of a B-tree is h, then it follows that the number, £, of leaves in the 
B-tree satisfies 

2B h - 1 <£<2{2B) h - 1 . 

Taking the logarithm of the first inequality and rearranging terms yields: 

" logB 
log^ 1 
logB 

= log B €+l . 
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Figure 14.2: A B-tree with B-2. 

That is, the height of a B-tree is proportional to the base-B logarithm of the number of 
leaves. 

Each node, u, in B-tree stores an array of keys u.keys[0], ...,u.keys[2B-l]. If u is an 
internal node with k children, then the number of keys stored at u is exactly k — \ and these 
are stored in u.keys[0],...,u.keys[fc - 2]. The remaining 2B - k + 1 array entries in u.keys 
are set to null. If u is a non-root leaf node, then u contains between B - 1 and 2B - 1 keys. 
The keys in a B-tree respect an ordering similar to the keys in a binary search tree. For any 
node, u, that stores k - 1 keys, 

u.keys[0] < u.keys[1] < ••• < u.keys [A:- 2] . 

If u is an internal node, then for every i 6 {0,...,k - 2), u.keys[i] is larger than every key 
stored in the subtree rooted at u.children[i] but smaller than every key stored in the 
subtree rooted at u.children[i + 1]. Informally, 

u.children[i] -< u.keys[i] -< u.children[i + 1] . 

An example of a B-tree with B = 2 is shown in Figure 14.2. 

Note that all the data stored in a B-tree node has size 0(B). Therefore, in an external 
memory setting, the value of B in a B-tree is chosen so that a node fits into a single external 
memory block. In this way, the time it takes to perform a B-tree operation in the external 
memory model is proportional to the number of nodes that are accessed (read or written) 
by the operation. 

For example, if the keys are 4 byte integers and the node indices are also 4 bytes, 
then setting B = 256 means that each node stores 

(4 + 4)x2B = 8x512 = 4096 



240 



14. External Memory Searching 



14.2. B-Trees 







1 


2 



16.5 



22 


23 





Figure 14.3: A successful search (for the value 4) and an unsuccessful search (for the value 
16.5) in a B-tree. Shaded nodes show where the value of z is updated during the searches. 



bytes of data. This would be a perfect value of B for the hard disk or solid state drive 

discussed in the introduction to this chaper, which have a block size of 4096 bytes. 

The BTree class, which implements a B-tree, stores a BlockStore, bs, that stores 

BTree nodes as well as the index, ri, of the root node. As usual, an integer, n, is used to 

keep track of the number of items in the data structure: 
BTree 



int n; 

BlockStore<Node> bs; 
int ri; 



14.2.1 Searching 

The implementation of the find(x) operation, which is illustrated in Figure 14.3, general- 
izes the f ind(x) operation in a binary search tree. The search for x starts at the root and 
uses the keys stored at a node, u, to determine in which of u's children the search should 
continue. 

More specifically, at a node u, the search checks if x is stored in u.keys. If so, x has 
been found and the search is complete. Otherwise, the search finds the smallest integer, 
i, such that u.keys[i] > x and continues the search in the subtree rooted at u.children[i]. 
If no key in u.keys is greater than x, then the search continues in u's rightmost child. Just 
like binary search trees, the algorithm keeps track of the most recently seen key z, that is 
larger than x. In case x is not found, z is returned as the smallest value that is greater or 

equal to x. 
BTree 



T find(T x) { 
T z = null; 
int ui = ri; 
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Figure 14.4: The execution of find I t(a, 27). 



while (ui >= 0) { 

Node u = bs .readBlock(ui) ; 

int i = f indlt (u.keys, x); 

if (i < 0) return u . keys[- (i+1 ) ] ; // found it 

if (u.keys[i] != null) 

z = u . keys[i] ; 
ui = u .children [i] ; 

} 

return z; 



Central to the find(x) method is the findlt(a, x) method that searches in a null-padded 
sorted array, a, for the value x. This method, illustrated in Figure 14.4, works for any array, 
a, where a[0],...,a[/c- 1] are a sequence of keys in sorted order and a[k], ...,a[a.length- 1] 
are all set to null. If x is in the array at position i, then findlt(a, x) returns -i - 1. 
Otherwise, it returns the smallest index, i, such that a[i] > x or a[i] = null. 
BTree 



int findIt(T[] a, T x) { 
int lo =0, hi = a. length; 
while (hi != lo) { 
int m = (hi+lo)/2; 

int cmp = a[m] == null ? -1 : compare(x, a[m]); 
if (cmp < 0) 

hi = m; // look in first half 
else if (cmp > 0) 

lo = m+1; // look in second half 
else 

return -m-1; // found it 

} 

return lo; 

} 
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The findlt(a, x) method does its job using a binary search that halves the search space at 
each step, so it runs in 0(log(a. length)) time. In this case a. length = 2B, so findlt(a, x) 
runs in O(logB) time. 

We can analyze the running time of a B-tree f ind(x) operation both in the usual 
word-RAM model (where every instruction counts) and in the external memory model 
(where we only count the number of nodes accessed). Since each leaf in a B-tree stores at 
least one key and the height of a B-Tree with £ leaves is 0(\og B £), the height of a B-tree 
that stores n keys is 0(log g n). Therefore, in the external memory model, the time taken 
by the find(x) operation is 0(log g n). To determine the running time in the word-RAM 
model, we have to account for the cost of calling f indlt(a, x) for each node we access, so 
the running time of f ind(x) in the word-RAM model is 

0(log B n)xO(logB) = 0(logn) . 

14.2.2 Addition 

One important difference between B-trees and the BinarySearchTree data structure from 
Section 6.2 is that the nodes of a B-tree do not store pointers to their parents. The reason 
for this will be explained shortly. The lack of parent pointers has the consequence that the 
add(x) and remove(x) operations on B-trees are most easily implemented using recursion. 

Like all balanced search trees, some form of rebalancing is sometimes required 
during an add(x) operation. In a B-tree, this is done by splitting nodes. Refer to Figure 14.5 
for what follows. Although splitting takes place across two levels of recursion, it is best 
understood as an operation that takes a node u containing 2B keys and having 2B + 1 
children. It creates a new node, w, that adopts u.children[B],...,u.children[2B]. The new 
node w also takes u's B largest keys, u.keys[B],...,u.keys[2B - 1]. At this point, u has B 
children and B keys. The extra key, u.keys[B - 1], is passed up to the parent of u, which 
also adopts w. 

Notice that the splitting operation modifies 3 nodes: u, u's parent, and the new 
node, w. This is where it is important that the nodes of a B-tree do not maintain parent 
pointers. If they did, then the B + 1 children adopted by w would all need to have their 
parent pointers modified. This would increase the number of external memory access 
from 3 to B + 4 and would make B-trees much less efficient for large values of B. 

The add(x) method in a B-tree is illustrated in Figure 14.6. At a high level, this 
method finds a leaf, u, at which to add the value x. If this causes u to become overfull 
(because it already contained B - 1 keys), then u is split. If this causes u's parent to become 
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u.splitQ 




Figure 14.5: Splitting the node u in a B-tree (B = 3). Notice that the key u.keys[2] = m 
passes from u to its parent. 
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Figure 14.6: The add(x) operation in a BTree. Adding the value 21 results in two nodes 
being split. 

overfull then u's parent is also split, which may cause u's grandparent to become overfull, 
and so on. This process continues, moving up the tree one level at a time until reaching a 
node that is not overfull or until the root is split. In the former case, the process stops. In 
the latter case, a new root is created whose two children become the nodes obtained when 
the original root was split. 

In summary, the high-level view of the add(x) method is that it walks from the root 
to a leaf searching for x, adds x to this leaf, and then walks back up to the root, splitting 
any overfull nodes it encounters along the way. With this high level view in mind, we can 
now delve into the details of how this method can be implemented recursively. 

The real work of add(x) is done by the addRecursive(x, ui) method, which adds the 
value x to the subtree whose root, u, has identifier ui. If u is a leaf, then x is just added into 
u.keys. Otherwise, x is added recursively into the appropriate child, u', of u. The result of 



245 



14. External Memory Searching 14.2. B-Trees 

this recursive call is normally null but may also be a reference to a newly-created node, 
w, that was created because u' was split. In this case, u adopts w and takes its first key 
completing the splitting operation on u'. 

After x has been added (either to u or to a descendant of u), the addRecursive(x, ui) 

method checks to see if u is storing too many (more than IB- 1) keys. If so, then u needs to 

be split with a call to the u.splitQ method. The result of calling u.split(), is a new node 

that is used as the return value for addRecursive(x,ui). 
BTree 



Node addRecursive(T x, int ui) throws DuplicateValueException { 
Node u = bs .readBlock(ui) ; 
int i = f indlt (u . keys, x); 

if (i < 0) throw new DuplicateValueException( ) ; 
if (u. children[i] < 0) { // leaf node, just add it 
u .add(x, -1 ) ; 
bs . writeBlock(u . id, u); 
} else { 

Node w = addRecursive(x, u .children[i ] ) ; 
if (w != null) { // child was split, w is new child 
x = w.remove(O) ; 
bs . writeBlock (w. id, w); 
u.add(x, w. id) ; 
bs .writeBlock (u. id, u); 
} 
} 
return u.isFull() ? u.split() : null; 

} 



The addRecursive(x, ui) method is just a helper for the add(x) method, which calls 

addRecursive(x, ri) to insert x into the root of the B-tree. If addRecursive(x, ri) causes 

the root to split, then a new root is created that takes as its children both the old root and 

the new node created when the old root was split. 
BTree 



boolean add(T x) { 
Node w; 
try { 

w = addRecursive(x, ri); 
} catch (DuplicateValueException e) { 

return false; 

} 

if (w != null) { // root was split, make new root 

Node newroot = new Node(); 

x = w.remove(O) ; 
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bs.writeBlock(w. 


id, 


w) ; 




newroot . children 


[0] 


= ri; 




newroot . keys[0] 


= X 






newroot . children 


[1] 


= w. id; 




ri = newroot. id; 








bs.writeBlock(ri 


, newroot) ; 




} 








n++; 








return true; 






} 









The add(x) method and its helper, addRecursive(x, ui), can be analyzed in two 
phases: 

Downward phase: During the downward phase of the recursion, before x has been added, 
they access a sequence of nodes and call f indlt(a, x) on each node. Like the f ind(x) 
method, this takes 0(log B n) time in the external memory model and 0(log n) time in 
the word-RAM model. 

Upward phase: During the upward phase of the recursion, after x has been added, these 
methods perform a sequence of at most 0(log g n) splits. Each split involves only 3 
nodes, so this phase takes 0(log B n) time in the external memory model. However, 
each split involves moving B keys and children from one node to another so, in the 
word-RAM model, this takes O(Blogn) time. 

Recall that the value of B can be quite large, even much larger than log n. Therefore, 
in the word-RAM model, adding a value to a B-tree can be much slower than adding into a 
balanced binary search tree. Later, in Section 14.2.4, we will show that the situation is not 
quite so bad; the amortized number of split operations done during an add(x) operation 
is constant. This shows that the (amortized) running time of the add(x) operation in the 
word-RAM model is 0(B + log n). 

14.2.3 Removal 

The remove(x) operation in a BTree is, again, most easily implemented as a recursive 
method. Although the recursive implementation of remove(x) spreads the complexity 
across several methods, the overall process, which is illustrated in Figure 14.7, is fairly 
straightforward. By shuffling keys around, removal is reduced to the problem of remov- 
ing a value, x', from some leaf, u. Removing x' may leave u with less than B - 1 keys; this 
situation is called an underflow. 
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Figure 14.7: Removing the value 4 from a B-tree results in one merge and one borrowing 
operation. 
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Figure 14.8: The remove(x) operation in a BTree. To remove the value x = 10 we replace it 
with the the value x' = 1 1 and remove 1 1 from the leaf that contains it. 

When an underflow occurs, u either borrows keys from one of its siblings or is 
merged with one of its siblings. If u is merged with a sibling, then u's parent will now 
have one less child and one less key, which can cause u's parent to underflow; this is again 
corrected by borrowing or merging and merging may cause u's grandparent to underflow. 
This process works its way back up to the root until there is no more underflow or until 
the root has its last two children merged into a single child. When the latter case occurs, 
the root is removed and its lone child becomes the new root. 

Next we delve into the details of how each of these steps is implemented. The first 
job of the remove(x) method is to find the element x that should be removed. If x is found 
in a leaf, then x is removed from this leaf. Otherwise, if x is found at u.keys[i] for some 
internal node, u, then the algorithm removes the smallest value, x', in the subtree rooted 
at u.children[i + 1]. The value x' is the smallest value stored in the BTree that is greater 
than x. The value of x' is then used to replace x in u.keys[i]. This process is illustrated in 
Figure 14.8. 

The removeRecursive(x, ui) method is a recursive implementation of the preced- 
ing algorithm: 
. BTree . 



boolean removeRecursive(T x, int ui) { 
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if (ui < 0) return false; // didn't find it 
Node u = bs .readBlock(ui) ; 
int i = f indlt (u. keys, x); 
if (i < 0) { // found it 

i = -(i+1); 

if (u.isLeaf ()) { 

u. remove(i) ; 
} else { 

u.keys[i] = removeSmallest (u . children[i+1 ] ) ; 

checkUnderflow(u, i+1); 

} 

return true; 
} else if (removeRecursive(x , u.children[i] ) ) { 
checkUnderflow(u, i); 
return true; 

} 

return false; 

} 

T removeSmallest (int ui) { 

Node u = bs .readBlock(ui) ; 

if (u.isLeaf ()) 

return u.remove(O); 

T y = removeSmallest (u.children[0] ) ; 

checkUnderflow(u, 0); 

return y; 
} 



Note that, after recursively removing x from the ith child of u, removeRecursive(x, ui) 
needs to ensure that this child still has at least B - 1 keys. In the preceding code, this is 
done with a call to a method called checkUnderflow(x, i), which checks for, and corrects, 
an underflow in the ith child of u. Let w be the ith child of u. If w has only B - 2 keys, then 
this needs to be fixed. The fix requires using a sibling of w. This can be either child i + 1 of 
u or child i - 1 of u. We will usually use child i - 1 of u, which is the sibling, v, of w directly 
to its left. The only time this doesn't work is when i = 0, in which case we use the sibling 
directly to w's right. 
BTree 



void checkUnderflow(Node u, int i) { 
if (u. children[i] < 0) return; 
if (i == 0) 

checkUnderf lowZero(u, i); // use u's right sibling 
else 

checkUnderflowNonZero(u,i) ; 
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shiftRL(v,w) 




Figure 14.9: If v has more than B - 1 keys, then w can borrow keys from v. 



} 



In the following, we focus on the case when i ^ so that any underflow at the ith child of 
u will be corrected with the help of the (i - l)st child of u. The case i = is similar and the 
details can be found in the accompanying source code. 

To fix an underflow at node w, we need to find more keys (and possibly also chil- 
dren), for w. There are two ways to do this: 

Borrowing: If w has a sibling, v, with more than B - 1 keys, then w can borrow some keys 
(and possibly also children) from v. More specifically, if v stores size(v) keys, then 
between them, v and w have a total of 

B-2 + size(w)>2B-2 

keys. We can therefore shift keys from v to w so that each of v and w has at least B - 1 
keys. This process is illustrated in Figure 14.9. 

Merging: If v has only B - 1 keys, we must do something more drastic, since v can not 
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merge(v,w) 




Figure 14.10: Merging two siblings v and w in a B-tree (B = 3). 



afford to give any keys to w. Therefore, we merge v and w as shown in Figure 14.10. 
The merge operation is the opposite of the split operation. It takes two nodes that 
contain a total of 2B - 3 keys and merges them into a single node that contains 2B-2 
keys. (The additional key comes from the fact that, when we merge v and w, their 
common parent, u, now has one less child and therefore needs to give up one of its 
keys.) 
BTree 



void checkUnderflowl\lonZero(Node u, int i) { 

Node w = bs .readBlock(u.children[i] ) ; // w is child of u 
if (w.size() < B-1) { // underflow at w 

Node v = bs .readBlock(u.children[i-1 ] ) ; // v is left sibling of w 
if (v.size() > B) { // w can borrow from v 

shiftl_R(u, i-1, v, w); 
} else { // v will absorb w 
merge(u, i-1 , v, w) ; 



} 



} 



} 

void checkUnderflowZero(Node u, int i) { 



252 



14. External Memory Searching 14.2. B-Trees 



Node w = bs .readBlock(u.children[i] ) ; // w is child of u 
if (w.size() < B-1) { // underflow at w 

Node v = bs .readBlock(u.children[i+1 ] ) ; // v is right sibling of w 
if (v.size() > B) { // w can borrow from v 

shif tRI_(u, i, v, w) ; 
} else { // w will absorb w 
merge(u, i, w, v); 
u. children[i] = w.id; 
} 
} 
} 



To summarize, the remove(x) method in a B-tree follows a root to leaf path, removes 
a key x' from a leaf, u, and then performs zero or more merge operations involving u and its 
ancestors, and performs at most one borrowing operation. Since each merge and borrow 
operation involves modifying only 3 nodes, and only 0(log g n) of these operations occur, 
the entire process takes 0(log g n) time in the external memory model. Again, however, 
each merge and borrow operation takes 0(B) time in the word-RAM model, so (for now) 
the most we can say about the running time required by remove(x) in the word-RAM model 
is that it is 0(B log B n). 

14.2.4 Amortized Analysis of B-Trees 

Thus far, we have shown that 

1. In the external memory model, the running time of f ind(x), add(x), and remove(x) 
in a B-tree is 0(log B n). 

2. In the word-RAM model, the running time of find(x) is O(logn) and the running 
time of add(x) and remove(x) is O(Blogn). 

The following lemma shows that, so far, we have overestimated the number of 
merge and split operations performed by B-trees. 

Lemma 14.1. Starting with an empty B-tree and performing any sequence of m add(x) and 
remove(x) operations results in at most 3m/2 splits, merges, and borrows being performed. 

Proof. The proof of this has already been sketched in Section 9.3 for the special case in 
which B = 2. The lemma can be proven using a credit scheme, in which 

1. each split, merge, or borrow operation is paid for with two credits, i.e., a credit is 
removed each time one of these operations occurs; and 
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2. at most 3 credits are created during any add(x) or remove(x) operation. 

Since at most 3m credits are ever created and each split, merge, and borrow is paid for with 
with 2 credits, it follows that at most 3ra/2 splits, merges, and borrows are performed. 

To keep track of these credits the proof maintains the following credit invariant: 
Any non-root node with B - 1 keys stores 1 credit and any node with 2B-\ keys stores 3 
credits. A node that stores at least B keys and most 2B-2 keys need not store any credits. 
What remains is to show that we can maintain the credit invariant and satisfy properties 
1 and 2, above, during each add(x) and remove(x) operation. 

Adding: The add(x) method does not perform any merges or borrows, so we need only 
consider split operations that occur as a result of calls to add(x). 

Each split operation occurs because a key is added to a node, u, that already con- 
tains 2B-1 keys. When this happens, u is split into two nodes, u' and u" having B — 1 and 
B keys, respectively. Prior to this operation, u was storing 2B-1 keys, and hence 3 credits. 
Two of these credits can be used to pay for the split and the other credit can be given to u' 
(which has B- 1 keys) to maintain the credit invariant. Therefore, we can pay for the split 
and maintain the credit invariant during any split. 

The only other modification to nodes that occur during an add(x) operation happen 
after all splits, if any, are complete. This modification involves adding a new key to some 
node u'. If, prior to this, u' had 2B-2 children, then it now has 2B - 1 children and must 
therefore receive 3 credits. These are the only credits given out by the add(x) method. 

Removing: During a call to remove(x), zero or more merges occur and are possibly fol- 
lowed by a single borrow. Each merge occurs because two nodes, v and w, that each had 
exactly B-\ keys prior to calling remove(x) were merged into a single node that has exactly 
2B-2 keys. Each such merge therefore frees up 2 credits that can be used to pay for the 
merge. 

After any merges are performed, at most one borrow operation occurs, after which 
no further merges or borrows occur. This borrow operation occurs when we remove a key 
from a node, v, that has B-\ keys. The node v therefore has one credit and this credit goes 
towards the cost of the borrow. This single credit is not enough to pay for the borrow, so 
we create one credit to complete the payment. 

At this point, we have created one credit and we still to show that the credit in- 
variant can be maintained. In the worst case, v's sibling, w, has exactly B keys before the 
borrow so that, afterwards, both v and w have B - 1 keys. This means that v and w each 
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need to be storing a credit when the operation is complete. Therefore, in this case, we 
create an addition 2 credits to give to v and w. Since a borrow happens at most once during 
a remove(x) operation, this means that we create at most 3 credits, as required. 

If the remove(x) operation does not include a borrow operation this is because it 
finishes by removing a key from some node that, prior to the operation, had B or more 
keys. In the worst case, this node had exactly B keys, so that it now has B — 1 keys and 
must be given 1 credit, which we create. 

In either case — whether the removal finishes with a borrow operation or not — at 
most 3 credits need to be created during a call to remove(x) to maintain the credit invariant 
and pay for all borrows and merges that occur. This completes the proof of the lemma. □ 

The purpose of Lemma 14.1 is to show that, in the word-RAM model the cost of 
splits, merges and joins during a sequence of m add(x) and remove(x) operations is only 
O(Bm). That is, the amortized cost per operation is only 0(B), so that the amortized cost of 
add(x) and remove(x) in the word-RAM model is 0(B + logn). This is summarized by the 
following pair of theorems: 

Theorem 14.1 (External Memory B-Trees). A BTree implements the SSet interface. In the 
external memory model, a BTree supports the operations add(x), remove(x), and f ind(x) in 
0(log B n) time per operation. 

Theorem 14.2 (Word-RAM B-Trees). A BTree implements the SSet interface. In the word- 
RAM model, Ignoring the cost of splits, merges, and borrows, a BTree supports the operations 
add(x), remove(x), and f ind(x) in O(logn) time per operation. Furthermore, beginning with 
an empty BTree, any sequence of m add(x) and remove(x) operations results in a total ofO(Bm) 
time spent performing splits, merges, and borrows. 

14.3 Discussion and Exercises 

The external memory model of computation was introduced by Aggarwal and Vitter [3]. 
It is sometimes also called the I/O model or the disk access model. 

B-Trees are to external memory searching what binary search trees are to internal 
memory searching. B-trees were introduced by Bayer and McCreight [8] in 1970 and, 
less than ten years later, they were referred to as ubiquitous in the title of Comer's ACM 
Computing Surveys article [14]. 

Like binary search trees, there are many variants of B-Trees including B + -trees, B*- 
trees, and counted B-trees. B-trees are indeed ubiquitous and are the primary data struc- 
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ture in many file systems, including Apple's HFS+, Microsoft's NTFS, and Linux's Ext4; 
every major database system; and key-value stores used in cloud computing. Graefe's re- 
cent survey [34] provides a 200+ page overview of the many modern applications, variants, 
and optimizations of B-trees. 

If only the USet interface is needed, then external memory hashing could be used 
as an alternative to B-trees. External memory hashing schemes do exist; see, for example, 
Reference [41]. These schemes implement the USet operations in O(l) expected time in 
the external memory model. However, for a variety of reasons, many applications still use 
B-trees. 

One reason B-trees are so popular is that they often perform better than their 
0(log g n) running time bounds suggest. The reason for this is that, in external memory 
settings, the value of B is typically quite large — in the hundreds or even thousands. This 
means that 99% or even 99.9% of the data in a B-tree is stored in the leaves. In a database 
system with a large memory, it may be possible to cache all the internal nodes of a B-tree 
in RAM, since they only represent the 1% or 0.1% of the total data set. When this hap- 
pens, this means that a search in a B-tree involves a very fast search in RAM, through the 
internal nodes, followed by a single external memory access to retrieve a leaf. 
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